 Thanks for showing up. I'm excited. This is my first, like, AM talk where people aren't, like, hungover. And so I'm super stoked that all of you are awake, presumably awake. I'm told there's a lot of jet lag if you're coming all the way from Seattle. But hopefully that'll work. My name is Joe Masty, and let's talk a little bit about hiring developers. So I am a consultant. I work in a lot of things with companies, but I tend to work with them on their onboarding, their hiring processes, working with apprenticeships and stuff. And one of the things that I've noticed, both with companies that work with me and some that maybe should, is that we have a problem with hiring. And this is a big issue, right? How many of you have at your job a job posting that you cannot fill for a developer? Right? There's a lot of us. It was a big deal. And these are not, in a lot of cases, this is not just, like, oh, hey, we can use an extra person on the team. This is, like, an exigent threat to your business. This is a big deal. And interviewing is hard. Another show of hands. I like show of hands stuff. How many of you have received a terrible interview? Has anyone ever attended? Yeah, so you go on, like, they just don't have their shit to go. Does anyone want to cop to ever having given a terrible interview? Okay, I have. Good. I was hoping that somebody would actually, you know, cop to that. And I think that this is funny because, you know, even big companies, you think about, like, the Googles and the Facebooks and all these companies that have, you know, 10,000 developers, they're not actually doing better. Their interviews are just as terrible as the rest of us. And that, to me, points to the fact that interviewing is, in fact, difficult. It's expensive. Anybody that you have on your interview team also has a full-time job, right? So these engineers who you expect to take hours and hours out of their day also have a complete set of tasks to deliver. So on the candidate side, so anybody who is applying for your job, they may also have another day job. They probably have other things going on in their life, right? And if you think that that's not your concern, remember that the candidates that you really want to hire are the ones that probably already have a job in other places where they're applying. So if you give somebody, you know, 100-hour homework, they're just going to move on. Right? And I think that we're making it ultimately worse on ourselves. We're not really doing any favors. My informal sense of how we tend to get an interview is how did I get hired, right? So I've been interviewed in a bunch of different ways. That would seem kind of cool. Maybe we'll do that one, or maybe we'll try something else. So it doesn't do us any favors whatsoever. And I'll tell you now, there is no perfect interview. So we're going to talk through a lot of things and make interviews better or worse, but there is no correct, per se, answer. I will say that there are a lot of bad answers. And the bad answers are the ones that, in large part, we're doing right now. And the result of that, clicker doesn't work, yes, is that this happens. So this is one of the main contentions I'm going to make to you. I wanted to put it in early because I want you to think about this. The reason that our interviews are bad is typically that we are not measuring what we think we are. When you have a bad interview, when you have puzzles, when you have abrasive interviewers, you're not measuring the candidate. You're measuring the interviewer. And the entire point of the interview, obviously, is to see whether that candidate is good. And so this is ultimately why you end up turning down good candidates. This is ultimately why you end up accepting bad candidates. This is why you have 200 interviews and never offer anybody a job, is because you're not measuring. And I don't think that this is on purpose, right? It's always bad on purpose. What's happening is that we don't have a tool set. We don't have a mental schema for how to evaluate if our interviews are any good. We tend to be engineering types. We're not coming from an HR background. And so usually it's that made it up. So good news, even if we have not invented correct ways to do interviewing, there are other fields that have, specifically ones that have been around a lot longer than we have, psychology. And what I want to talk about today is industrial and organizational psychology. So this is one of the major branches of psychology. It started in the 1800s, late 1800s. It really came to prominence in the 1920s, which is during the First World War, what happened was psychologists in the army needed to figure out where to place a million recruits, literally one million recruits. And so they needed to come up with a way to handle that process. And so they started to codify what they call selection. So I'm going to include, at the end of this, there's all the references. And there's a lot that you can look up. If you do find yourself wanting to look at primary material, selection is the name of the concept that you want to Google. Cool? And so it's going to be a little bit tough for me to cover 100 plus years of psychology. Unfortunately, they have written a lot over the course of five generations as it turns out. But so what I want to do instead is I'm going to give you a tool in three parts, a way to think about the interviews that you're doing. And then we're going to cover a couple of the common kind of tropes, the things that we tend to see in interviews, and look at them through that lens. Cool? I like you. So number one is validity. So if we have something we want to measure, if we have a construct, is what we call this in psychology, validity tells us whether our measurement measures that thing. So these bullets do not very closely line to each other, but they are all centered approximately on the bull's eye. That's validity. It's OK that they're spread out. It's OK that, in fact, most of them are wrong because they are measuring the correct concept. And there's a couple of different factors to validity, things that I want to consider while we're here. One of them, one type of validity, is the question of whether the thing, the question that you ask corresponds to the concept you want to test. And so if, let's say, I wanted to test whether you know arithmetic, if I ask you to list off the digits of pi, is that a test of your arithmetic abilities? No, right? If I ask you to do 5 plus 5, may not be a very question, but it is, in fact, arithmetic, right? And the second type of validity that I want to talk about is a sort of wider one. Given that I can test your arithmetic, does that correspond to a skill that I need you to have? I can test your arithmetic, but if the job that I'm trying to hire you for is carpenter, is that actually a valid skill for the job? Someone's called external validity, is the name of that one. And you'll notice that all the concepts of validity talk about a construct that you want to test. You have to know what the bull's eye is. And so this is actually our first wrinkle when we come to hiring a developer. Because as it turns out, we probably do not agree on what makes a good developer. There is a lot of complication to our field. And so as it turns out, it's very difficult for us to say what success even is in this sense. So think about what a great developer would be in your terms, right? Hopefully it doesn't look like this. Maybe, maybe not. But what it probably does look like is people that you know, or people on your team, who have been very successful. And you think about a bunch of characteristics of those people who you've seen who are successful. And you kind of generalize and say, OK, that's a good developer. But that is not a real concept of good developer. What that is is kind of a bag of characteristics. Some of them may actually relate really well to whether somebody's a good developer. Some of them may not at all. And so one of the things that happens is when we start to measure people based on what we've seen from success, we end up measuring all these things that we didn't intend to. And we get that. And we get more of that. And that's what our entire team becomes. Reliability is concept number two. So if validity is whether we're centered on the bullseye, reliability is how close our measurements are to each other. So in this case, we don't even care if it's centered as long as the measurement comes out the same. And so just like validity, there are a couple different concepts in here. One of the big ones that's really important in technical interviewing is that if I give you an interview and if somebody else gives you that same interview, you should get the same score. If you don't get the same score, you're not measuring the candidate, you're measuring the interviewer. That includes if I interview you and then another day I were to interview you, but I'm pissed off. These things should not impact our measurements, but they do. That's called inter-rater reliability. A second one is if I take an interview once and if I take that interview a second time, I should get the same score. This is called test-retest reliability. And what that means is that if there's an element of chance, if there's an element of did I happen to follow the one path or the other path and it totally changes my score, again, I have not measured me, I've measured the instrument or I've measured the random chance that I took A instead of B, and we're hiring on a dice roll, right? And then the third type of reliability, we're not going to see a ton of this, but if I were to have multiple questions, those questions need to yield the same result. So if we go back to our arithmetic example, if I ask you, what's five times five? Thank you, some of you know arithmetic. What's eight times eight? Thank you. What's 265 times 12? Nobody. Clearly you don't know arithmetic, right? It's the same form of question, but what happens is we have these questions where there's some other confounding variable. In this case, we've all memorized a very particular set of multiplications and we're not actually doing them in our head, we're just doing them by rope. This happens all the time in interviews. We measure a construct that some people have memorized and other people have not. And all those concepts of reliability, I think that's an interesting one, point towards approximately the same thing. They point towards consistency. So we're not gonna belabor this in the rest of the talk, but I wanna say that if you give the same interview, if you give it the same way, if people have a scoring rubric so that it doesn't matter who's there, what matters is how you do, right? If that scoring feedback is objective, you will tend to be reliable. So reliability, consistency, I'll say consistency breeds reliability. So number three, usability. Held up two fingers for three. We could probably come up with a valid, reliable interview for developers and what it would look like is having you do one of everything, right? If I were to do this in my arithmetic, I would just ask you one of every single question. But of course this doesn't work, right? And it's tempting sometimes in our interviews to be able to just smash enough things in that we get that accurate measurement, but of course that doesn't work. It's exhausting, people don't wanna go through it, and ultimately what we have is a really limited opportunity to take measurements from people. So we need to be really careful about how our usability works and this differs between company to company. So one of the things that's really tragic about stealing the Google interview, if you take Google's interview and use it for yourself, they can abuse their candidates because people still wanna go work for Google, right? It's true, it's kind of a shitty process, but it works for them, because people don't drop out. That probably doesn't work for you. And this is different between candidates as well. So imagine we give homework, let's say it's a 20-hour homework, right? Somebody that takes a bunch of time. Some candidates, if you don't have a job, let's say you just graduated a boot camp, let's say that you just left your job, fantastic, no problem. If you're somebody who has a real job or got help you, if you're somebody who has a family, if you have other commitments, if you've ever had a medical issue, this is now what you're measuring. You didn't mean to measure that. You didn't mean to just exclude anybody who had ever had a family, but that's what's happening. So we need to keep usability in mind. And so to throw a couple other confounding factors, your target probably does not look like this, or in reality, nobody's target looks like this, but your target looks like something else. And that is because your requirements, the things that you do, the constructs you need to measure are different for you than they are for everybody else, right? And so you cannot just take somebody else's interview. And when you're thinking about these interviews, you can't simply say, okay, we'll just maximize one dimension, right? I can tell you approximately how valid different types of interviews are, but you have to balance that against the other factors. You need to say, should I trade off validity for reliability? Because the correct answer is often yes. And then in reality, bringing back to the usability, we get this kind of thing. Three concepts, pick two-ish, probably really pick like one and a half. So there is no perfect interview. You're not gonna get something that is off to the top right here. You're gonna get something that's messy and dirty. And what that means is that the only way for you to tell if that interview works is to test your interview. You must test your interviews. And what that looks like, if you have an existing team, this is actually nice, you can give your interview to your existing team, assuming you like them. If you don't like them, just invert the results. But assuming you like them, you can give the interview to your own team, but that's also not good enough. And this is why it's so hard to do stuff like measurement, is because if you had, let's say that you managed to come up with a team that's all right-handed, if you used to now create an interview that happens to be very difficult to complete with your left hand, your entire team will succeed. We are accurate and valid and all those things. But again, we've managed to measure something that we don't intend to. So you need to go out and you need to test your interview against people who are not part of your existing team and who are not part of the kind of experience and demographics of your existing team. Cool? All right, let's figure out how to use these tools. I don't know if that's an ax or a hammer, but it's probably an ax. So we have the tools, we have reliability, we have validity and we have usability. Let's talk through the interviews. So I'm gonna cheat, this is not part of the interview, but it's my talk and you can't all leave fast enough. So huh. The interview process really does start back at the job posting, this is good and this is bad. The reason I wanted to bring this in here is because if you, again, if you have, if you accidentally exclude a bunch of people, if your job posting causes nobody with left-handedness to apply, then nothing you do in the rest of the interview will ever fix that, right? They're not even in your queue. So let's talk about what you need. This is the opportunity. So we talked about constructs and we talked about how you can't just steal them from somebody else. The job posting is an opportunity to think about the things that make somebody successful for your organization. You need to think about what things you actually wanna measure. And a good rule of thumb here is that if you're putting it as a real requirement, you should probably actually measure it during the interview. If you have something you're not gonna bother to measure it, you probably don't need it. You need to prioritize these things because everybody is different and everybody is flawed. And so in reality, if you have a list of six things that you really need, pardon me, you probably really only need like four or five of them and if somebody comes in, you may wanna hire them anyway. And then again, going to need versus want, I want somebody who's great at testing, I want, somebody who's great at refactoring, I want, somebody who can scale my services and who can scale everything and scale buildings, but I don't need that. And so remember that many people have been socialized not to apply for jobs that they don't qualify for. So the more you put on the need, the more you exclude people. I happen not to agree with applying for jobs that don't qualify for, but that is the reality of what it is. What are you asking for? The actual text of your posting is really relevant. Have any of you ever declined to apply for a job or discontinued applying for a job? Because you saw a posting that wanted like some variation of the Ninja, Unicorn, Jedi, right? Is anybody, so I have neglected to apply for places just not interested, right? Do you think that they meant to exclude me? I'm a Ninja, yeah. But the reality is that that happens and so the words that you use, what you actually ask for matters. In the resources, I'm gonna point to two different things, two resources that I like to use for this. One of them is called Textio. What they do is you put your posting in there and it kind of tells you these are things, like these are corporatey words that people tend not to do very well, right? And that can help you. And the other one is called JobLint. And so that's one where again, if you've got these kind of Ninja rock star, we're gonna go crush some code. It can point out a lot of those things to you that you may not have considered in the past. So we need to be careful about what we are asking for. And then we need to think about where we are asking. If the only place that you post your job is Carnegie Mellon, where by the way you are like the 50th most interesting startup, right? Your team is gonna reflect precisely one background. It's gonna be CMU. That's not good. We need to have a variety of depth. The same thing goes for your network. If your network is relatively homogenous, if you all tend to come from the same place and do the same things, that is not gonna create a sufficient candidate pool for you. So think about where you are posting jobs and you need to reach outside of that comfort zone to find more people. And ultimately, this is a good thing for your team. It's the initial screen. We get some resumes. Let's talk about the different types of screeners that people tend to get. These are like trivia, I think would be the category you would call this. If you wanted like a litmus test, if you can Google something in 30 seconds or it'll take you like 10 seconds in IRB, it's probably a trivia question. So let's think about validity here. In theory, well, maybe not this question so much, but there are questions that are trivia that you could ask that might be valid. The problem is that there's not very much signal. Usually if I ask one thing like, what's the name of this method or what's the order of arguments of this method, it's very little data. And what it ends up rewarding, what it ends up measuring instead is recency. Have you dealt with it lately? So a lot of times the way this interview works is that the director or the CTO reads something on Reddit and then they go have a phone interview and they're like, oh, hey, do you know this thing? What is the memory footprint of this string thing? Is that valid to the job? No. A minute ago you didn't know it and you were fine. So this is not real valid criteria. Could it be reliable? If you ask the same question, I think it could be reliable. Do people ask the same trivia question? No. We mostly get it from whatever they were thinking about last. So in practice, this is rarely reliable. Is it usable? Well, one trivia question is very easy. I'll say, I think that you could get a more valid version of this by asking like 50 of these. I think that with enough trivia, you might be able to get something that resembles a valid question, but then that's not usable. This is not a very good one. Do we know that it's part of? Very common font screen. It's Fizzbuzz. I think Fizzbuzz is interesting. So jumping it valid, is it valid? Maybe. There is an aspect to it that says can you write some amount of code? It does suffer from the problem where people tend to study for Fizzbuzz. Like if you are a boot camper, you absolutely are learning to write Fizzbuzz cold. But it does track some kind of concept. Is it reliable? Yeah, it's probably reliable. I can administer it. I can think about the different types of things if people get right, get wrong, I can score it. Is it usable? Yeah, it's actually pretty usable. I think ultimately that's why people use it is because it's very easy to administer. If we wanted to make this better, we would probably want to change it from one that's really well known. Like I said, everyone knows Fizzbuzz. If you're looking for a job, learn Fizzbuzz cold. You've now passed 30% of phone screens. Good work. But I think if we change that, we would have something that's a little more valid. Homework. Who issues homework as part of their hiring process? By which I mean, go write some code on your own time and submit it. Fair number of people. I think, again, homework is an interesting one. Homework is super valid. Homework is a work sample test. So in the terminology of IoCyc, probably the most valid way to, or the most predictive way to look at somebody's work is to have them do the work. That is a work sample test. Homework works for this. But it has significant problems. The reliability is an issue here. Because what you have is candidates, some of whom can spend 20 hours and some of whom can spend five hours. And if you're grading criteria, don't take this into account. You end up with a really, really different set of scores for people. And you did not intend to measure again whether I have commitments at night, but you did. And so the way we can fix that is maybe to put some parameters around it. Hopefully you give them an assignment that's related to your work, goes to validity. Hopefully you give them an assignment where you say, spent about five hours. Could somebody cheat? They could. But it gets you closer to having an accurate baseline comparison. And hopefully you don't give any homework that is like, please re-implement our whole app or please work on this NP-complete problem or this thing that our software architects can't even solve, but show us a working demo. Like sometimes we have this tendency to just like, that's a thing I was thinking about. Doesn't work. And then I'm not gonna spend a lot of time on this one, but something I've seen a lot of recently is these sort of sites where they promise to give you a score, and so you go there. And I think it's super, super usable because it doesn't take any engineer time. And it may even be reliable because you tend to ask like the one question, right? Or a handful of questions are the same question. But the question of validity comes up here. And I think that this is ultimately where these become problematic is that the questions in their question bank because they need to be auto-gradable and because they need to generate this big volume of them tend to have very little to do with the actual business of building software. Is that fixable probably? Have I seen it yet? No. So interview day. A year ago, Carrie Miller gave a talk at RailsConf about hiring. Problem solved. It's actually a really good talk. You should go watch it. And there are a lot of things about the interview day that she covers. The big ones I wanna cover right now is really minimizing variance. So anything that differs between candidates ultimately is going to give you extra noise that you're not measuring. What that means is that you should have a schedule and it should be a consistent schedule. You should know who your interviewers are. And your candidate should know that as well. Your interviewers need to be trained. They need to understand what it is that they're doing. They need to have a scoring rubric. And I'm gonna contend here that your candidate should probably know what you're measuring. Because if you're being sneaky about it, it's probably a stupid question, right? So if you can do those things, tell them what to expect. You're gonna put them more at ease. You're gonna get more signal. First thing they go into, code writing. Has anyone implemented a, we'll say a red-black tree at work in the last year? No, no hands? Okay, good. Don't do algorithms. They look like code. They look like code we use, but in reality that's not how we build software. Nobody ever implements red-black trees from memory at work, right? So there's a problem. They feel valid, but they're really not because the correspondence is very low. Are they reliable? No, not really. They have some of the same recency bias. Somebody who studies, somebody who just graduated CS is probably gonna remember this better than somebody who's got a bunch of years of experience. And so great, now you have an inverse selection. You select for inexperience because those people remember algorithms. Good work. Usability, yeah, it's probably pretty usable, right? I think a better version of this would be if you were to take an algorithm, like don't pick something with somebody famous's name, Dijkstra's anything or any of that. Pick an algorithm that you make up and have them implement that from a sheet of paper. Say, here's the algorithm, here's your laptop, do that. This defeats the recency bias. This is an actual test of what we do for software, just translating requirements into working code. Don't give algorithms. The even worse version, whiteboard coding. And everyone's expecting me to hit on whiteboard coding and I will, because it's dumb. If at your job you're required to whiteboard code, not if you have the option, not if people tend to, but if you're required to whiteboard code, please by all means go ahead. If not, what you are measuring is a skill that people don't use unless they are practicing interviews. In that case, just give them a fucking laptop. It's really not that hard. And then the live bug code. Does anyone do this? Wait, don't raise your hands. It's technically illegal. If you're not paying your candidates, you can't have them work on production code. In most states. Check your local list. States. But, so the cool thing about this, validity-wise, this is like 100% valid, right? Because this is the work. Like you could not get more valid than this. And that's cool. But the problem becomes the reliability. If you're working on a real code, you generally can't repeat the same problem. Or if you do, you have serious problems and shouldn't be hiring. So usually what that means is that we have this trade-off. Either I spent a ton of time in my backlog finding things that are similar-ish, in which case it's not usable. Or I just kind of pluck something out, in which case it's not reliable. Remember that parallel form is reliability. But also it's illegal, so probably you shouldn't do it. So now that they're exhausted, they've written some code, let's do some problem-solving. Does anyone know what interview question this is? I actually got hired on this once. It's dudes, they're buried up to their necks and they've got hats on and like, you have to tell who's wearing what colored hat and if they do that and they don't get killed or something. This is just dumb. Let's talk about validity. No. Let's talk about reliability. No. Let's talk about usability. Yeah, great usability. Awesome. But you've measured nothing. Not only that, but your candidates are probably looking this up. Again, if you're a boot camper, if you're looking for a job, just go look up the six or seven of these that everybody uses. Learn cold, you'll find, pretend that you're having a hard time with it and then come up with a trick. Ta-da. A little bit better, case studies. Given a hypothetical, how would you deal with it? This can be valid. You can use your existing work. This can be reliable. You can give the same case study based on historical precedent. You need to modulate a little bit. If you have a senior developer and you are giving them, say, some architectural problem that you can't solve, it's an issue. If you have a junior developer and you're giving them effectively any architectural problem, that's not something that's in their skill set. That's probably not what you mean to measure. But even better, this is my favorite kind of interviewing. Behavioral interviewing? Has anyone ever seen this? So this, it takes the same form every single time. It's tell me about a time 1x. And the reason this works is that despite what they have to say in the financial sector, past performance absolutely predicts future performance. Absolutely does. And so, this is a great interview insofar as you can test somebody's real experience. It's valid. Yeah, I know. Great work. It's reliable. You can test the same thing over and over again. And then usability. There's a little bit of a challenge. It turns out that to get good answers out of people, you have to train them on this kind of interviewing. But that's okay. You can overcome this. And then culture fit. If your culture fit looks like everyone I've seen before, it's your CTO or your director going in, shooting the shit for about 40 minutes and then deciding yes or no, right? So you've had this before. You're measuring for people like me. There's no criteria. There's nothing. There's a gut feel. And the way the gut feel works is that it takes into account every one of our preconceived notions. So it is, again, almost zero validity. My answer to you should be just don't do these. But you're gonna do them anyway. So if you're not gonna listen to me, at least think about the things in your culture that do cause people to be successful. We ship first all the time. We ship the highest quality all the time, right? We teach everybody. Or we're independently capable. These kind of things that actually could measure success and at least measure those. But better off just don't use them. So you send the person home. Now that they're gone, no more problems, right? So the way the debrief ends up working all the time is of course we get into a circle and then everybody does. It's kind of like when you do Rock, Paper, Scissors as a kid, you're like Rock, Paper, Scissors. People read each other, right? They read each other socially. This is what happens. Or you end up talking to the person on my left. I think this person's like an eight out of 10. They're like, yeah, no, that was the worst. They go, yeah, it's six. Cause I'm like, well, I don't want to be an idiot. So the way that we fix this, we write it down first of all. Have your interviewers write down specific objective feedback. Not, she was cool, but she missed the test coverage issue in this question. She solved the problem with 10 minutes to spare. And then share all that feedback at once that nobody can cheat. Make sense? Cool. So recap. If you want to hire well, welcome. Time for recap. If you want to hire well, you need to pick a set of constructs and you need to design and test interviews that are valid, reliable and usable. Cool? So there's one more thing I want to talk about before we actually go. And the reason is the point at which I submitted this talk. I was having an issue with a couple clients where they had teams that we'll say were more than a little bit homogenous. And so I talked to them about their interview process. And what they always said is the feedback I get all the time is we only want to hire people, we only hire the best, we don't want to lower the bar. And I hate this, right? But I couldn't tell you precisely why. Like I couldn't give you the reasoning. I know it's wrong, but I couldn't give you the reason. And I understand the reasoning now as part of this talk. And the reality is that for a lot of us, what they believe, they're not bad people, but they believe they have this bar and you're over it or you're under it. And in that sense, of course you don't want to lower it. But the reality is the bar doesn't look like this. The bar is like weird and tilted and fucked up. Because of all these extra things that you're measuring that have nothing to do with job success. And so I'm gonna say we should probably actually raise the bar. Most people with interviews are not nearly as tough as they think they are. But to do that first we need to make sure that the bar is straight. Cool? Thank you.