 Thanks Kevin. So I'm going to be talking about student evaluations of teaching. I have to give credit to my advisor Philip who isn't here today. He's at NYU's Data Science Equivalent and Anne Boring who's at Sciences Po in Paris. This was a really fun paper to write. She just visited us for two weeks and we sat down and tried to write a paper. So the question we asked are student evaluations of teaching a valid measure of teaching effectiveness. If you've ever taught a class you've probably been evaluated by your students. Usually they fill out a form at the end of the semester and then what happens is these numbers get boiled down to averages and they're used to compare instructors across courses and to make decisions about hiring and promotion and even firing instructors. So it seems like they ought to be a valid measure of teaching effectiveness if we're going to use them. So we concluded that they're not. We took two existing data sets and we did a reanalysis of them. The first was Anne's data from her university in Paris and the second was a randomized experiment that was done here in the U.S. So Anne's data was a census of teaching evaluations over five years. It was all the first year undergrads at her university and when students signed up for classes they didn't know which instructors they were going to get. So it was sort of a natural experiment. We found some interesting stuff in the data. So first we found that male students were significantly giving their male instructors significantly higher ratings than their female instructors. This wasn't the case with the female students though and we had midterm grades and final grades. That's what this plot here is showing the correlation between the midterm grades and the ratings and the final grades and ratings. So students didn't know their final grades before they did the evaluations and there's basically zero correlation between the final exams and the ratings. The U.S. data was actually a randomized experiment. It was an online course with four instruction sections and two TAs. One male, one female and in one of their sections the TA swapped identities with the other instructor. So the students didn't know the true gender of their instructor and we compared the ratings of the female identified against the male identified instructors and across the board the ratings were significantly higher for the male identified instructor even on things that ought to be objective like how promptly assignments were returned. So this is pretty bad. We conclude that SET measure customer satisfaction rather than teaching effectiveness. Their bias against female instructors and the biases are inconsistent across different domains across the different universities and we don't think there's like a good systematic way to adjust evaluations to correct for the bias. And I want to point out how I think this project relates to my work here at BIDS as a data scientist. I think a lot of what is important about data science is reproducibility, making things transparent, open and doing good software design. So we thought really hard about how to do the statistical test properly so that we'd get valid P values. And we put our tests in a Python package called permute that you can download. All our analyses are in Jupyter notebooks and they're in a GitHub repository. So you can go run the analysis if you're curious. And we published this paper in science open research. They're open access and they have post publication peer review. And so anyone can download it, anybody can read it and have a look for yourselves. Thanks.