 Hi everyone, I'm Matt Ranger. I'm a data scientist at a company called Equizio. This talk is not gonna be as technical as the other talks. There's not gonna be a single line of Python. It's more about higher level issues and trends that you're gonna see in the industry. Obviously a lot of Python people, it seems about a third of the audience as data scientists here. That's gonna be relevant to you. Hopefully it's gonna be interesting for everyone though. Almost everyone spends a significant amount of their days dealing with recommendation systems either through feeds like this or otherwise. Recently we've seen a couple of issues. People have been talking about filter bubbles for instance which is the longer you interact with such a system the more it becomes specialized to you and the more it tunes out things that you might be unaccustomed to. So people who are pretty conservative are gonna end up on effectively the conservative internet and people who are liberal are gonna end up on the liberal internet. That's been fixed recently in YouTube but for instance for a long time some people who are really into conspiracy theories would have their world sort of filter out into the rest of people. If you search for lunar apocalypse in 2018 on YouTube you'd get things about the flat earth pretty fast. If you search for CIA you'd get stuff about the whistleblower or CF for some reason. If you click on those links long enough you end up in a Nazi held whole of conspiracy theories. It actually takes about three or four levels. They got better at this recently after the outcry from the Wall Street Journal articles that came out back then. Similarly and more concerningly this is a study from ProPublica two years ago where police stations in Florida were using a recommendation system to predict the probability of re-offending and it turned out that the system was extremely racist. So by, yeah shockingly so. If you just look at the two people there. So that's what's actually been called in the media as algorithm bias. It's not actually the algorithm that's biased. The algorithm's fine but people are terrible and the algorithm emulates what people tend to do and it scales it up. So it actually makes the problem worse and at bigger scales. So the question I'm here to ask is in this case is better or worse. So are better algorithms actually just worse overall and over time? So we're gonna talk about that. Filter bubbles and algorithm bias I think are inherent to those types of systems and with just common knowledge, easy checks, we can and should do better. The fact that YouTube was doing this pretty bad and that they have a gigantic data science team last year means that we're pretty far behind what's happening just as a community. Most of those systems work in a pretty simple way. Basically you have Y which is going to be the number that ranks your feed if we talk about let's say Twitter which is mapped from X which is gonna be your personal user data through a function called F. So in the old days let's say Twitter, F would just be from the people you follow select the most popular post in the last 24 hours. Nowadays most systems are gonna deal with stuff like gradient descent. So let's say it's going to find the probability given some more complex attributes like let's say the length of time you've watched similar videos in the past your age stuff like that. The thing is even though we're basically in the peak data science hype these days those things aren't really new. Linear regression is more than 200 years old. We've added a bunch of data and we've added a bunch of fancy buzzword and algorithms to it but we're basically doing the same things that we've always done. So those issues aren't really new. Then why do people say that machine learning is really hard to maintain and leads to a bunch of problems? Well, those guys point to feedback loops but they also point to just technical debt where if you think you have a system that looks like this where you have raw data and then you have a couple of transformation you train models, you feed those models to people who ask for them it actually ends up looking something like this over time which is normal but software engineers have dealt with those sorts of issues over time for since the 80s and we know how to deal with spaghetti code by this point, so that shouldn't really be our excuse. What I think is the problem actually is that what we call supervised learning where you take user data and you train it to get the best answer for some problem was not gonna solve the problems that people actually have with it like filter bubbles or algorithm bias. It's actually optimizing for those things. If you look at a system like this where you have data and your algorithm is gonna live over here and it feeds answer to the user that's going to be your black box in theory but in practice what happens is that the user data comes back in the next iteration which is fed to the model in the blue box but the problem is the blue box only thinks about what's going on in the blue box and not the whole system but the whole system moves over time. The first thing you're gonna notice if you start digging into this is a thing called path dependence which everyone's basically been in contact with with the concept of post going viral. If you look this is the same exact post on Reddit posted twice about an hour ago by some person in Reddit slash you slash data is beautiful. One of them went about 60 times more popular than the other one and the only reason for that is that it was lucky early on and because it was popular it was recommended to more people which means that more people saw it which means that it just snowballed. Sorry. This sort of thing is gonna happen regardless of if your function that maps the recommendation is a simple sorting algorithm or a fancy neural network it's inherent to the structure of the problem. So all the data science hype is not gonna fix these sort of things or this is another example from YouTube. If you just keep on clicking on recommendations you're gonna get more and more popular videos. So basically videos become popular because they're popular recursively. Now this is not really concerning about viral posts because no one cares. More concerningly you're gonna see stuff like this in so if we circle back the law and the criminal sentencing. So it turns out lawyers have criminal sentence recommendations that they give to the judge. It basically works like an algorithm so you have input which is your defendant and his record and the exact thing he's being convicted for and then you have a bunch of rules and then you add random noise which the lawyer's call a judge and basically one of the rules here that we have this is in the US but same thing in Canada is that if you're a judge then you can't convict someone much differently than convictions for similar cases in the past. You probably see this as a problem because white collar criminals keep getting away with a bunch of things. For instance recently there was Paul Manafort who could have been sentenced for up to 40 or 60 years in prison and got four for doing a bunch of awful things or the bankers in 2008 who got basically nothing whereas people who do petty crimes because historically theft or similar crimes get harder jail sentences. You just get similar things that happen over time. So yeah, as this one lawyer says the system's discrepancies become self-justifying and self-perpetuating. Judges give white collar criminals lower sentences because white collar criminals typically get lower sentences. If you're the judge and you come up and you have these rules and people historically have gotten low sentences you can't just give them a higher sentence because you think that's what's fair. You're basically stuck in the system where you have to give low sentences because low sentences are given and nothing else. Similarly the Reddit posts being popular just because they're popular. Most people don't really deal with those issues unless you're dealing with a large system with a bunch of users but we're gonna have a riddle for a second and we're gonna tie this back to actual systems that almost everyone uses every day. Here's the riddle. The counties in the U.S. where the incidence of kidney cancer is the lowest are mostly rural, sparsely populated and in Republican states. If you consider the counties where the incidence of kidney cancer is the highest they're actually exactly the same type of counties. Does anyone have a guess why that might be the case? If you haven't read a book called Thinking Fast and Slow? Yeah, sure. That's a really good, actually, yep, that. You're pretty close. There are two things. Most counties are that type of county and those counties have very, very few people. So one person having kidney cancer more or fewer changes the actual overall rate for the county by a lot. So if you look here, for instance, this is the incidence of kidney cancer and this is the population over there. If you look at Chicago over there they're basically exactly the average because there's a bunch of people in Chicago. If I get kidney cancer and I live over there I'm not gonna change the average. If you live in the Pulaski County, Illinois with 45,000 people four average cases per year if your grandma gets kidney cancer moves the average a lot. What happens here is that if something starts in any of those systems over here or over there because the systems are self-reinforcing you're gonna end up in a loop. So a lot of the posts that are popular on Twitter, Facebook, YouTube just start lucky and keep snowballing. And a lot of things actually start down here and keep snowballing down whether it's the justice system or recommendation system for some media you're watching. The other issue is that over time the entire software system does its whole optimization. If individual parts in the system feed data into other parts and those parts are actually optimizing over things then the system is gonna end up somewhere over time. The easiest way to look at this is if we make our previous little system just a tiny bit more complex and we look at, let's say, Twitter, Facebook or YouTube as a whole and instead of just having a recommendation system giving recommendations to users you have content creators that make content that has to be fed. Then the content creator is personally trying to optimize for money or views or some personal purpose. You're not gonna make content put it on the internet for no one to hear. And so they're gonna react to what they think the recommendation system is actually proposing. So for instance, when people think that YouTube recommends longer videos higher what happens is that over time the videos on YouTube become longer and longer and longer. You can argue that this is also because it's easier to upload videos to YouTube but this is cross verified that just the average, let's say makeup tutorial went from three minutes to 19 minutes and rambling about a dog now because people who make those types of videos think that longer videos are better. Similarly, if you look at Google nowadays and you're trying to find a recipe you might realize that a lot of recipe sites have a bunch of garbage before you actually get to the recipe. This is pretty simple. It's because people who deal with recipe sites think that time on page is a positive attribute in a Google search. So they're actually gaming the system to force you to swipe a couple of times to get to the recipe, the SEO. That's not really optimal user experience just an artifact of how the system evolved over time. There's another more concerning issue also is some people deep mind especially put a paper out recently think that and it's pretty reasonable the systems you interact with daily actually affect your taste. So if we come back a couple of slides to this the user actually changes over time just because if you're exposed to a bunch of conspiracy theories on YouTube over time your tastes are gonna evolve towards that. So everything's actually moving at the same time. This gets pretty terrifying because we've looked at pretty simple examples but if we go back to the bucket of spaghetti that feeds data back into itself it's effectively impossible to reason about. At this point you basically have a bunch of random things optimizing in directions you don't really understand and people trying to make money out of it and the thing is only working because of a scaffolding of scary code that you don't really understand. So what do we do about it? Well first knowing that this is an issue is a pretty good start so that's good. The other thing we need to know is that this is not really a statistics problem the problem is outside of the actual algorithm you're dealing with. People have studied this in like game theory for instance for a long time this is a paper from the 90s early 90s that looks at in games if you believe nonsense you can actually reach an equilibrium as long as and this is the important point as long as you don't have anything that proves your beliefs wrong you can end up in pretty weird places. This was put in response pretty scaringly to the Federal Reserve in the US mismanaging basically their money until the 1970s because they didn't believe that certain combination of inflation and unemployment was possible and would have a certain effect until they tried it and it turned out it didn't and now they're much better at what they do so that caused a bunch of studies on this thing and I think anyone that works on any type of system is affected by this anyways. So if you go talk to computer scientists the actual solution to this sort of problem where you have a system that feeds a recommendation and this recommendation changes the possible actions of the other person and that person takes a decision which feeds back what they tell you is well reinforcement learning is what you're supposed to do. More simply I think it's pretty ridiculous to expect everyone to take all of their what we call supervised learning system where you train something to find the optimal let's say conversion rate for a click and transform them into reinforcement learning systems. First of all because reinforcement learning systems work on timeframes that aren't basically possible for most systems we're looking at. However there are a couple of things that we can gleam from this. The first is that there's a trade off between exploring and exploiting and if all you're doing is optimizing then you're never exploring so you're just gonna end up in filter bubbles. This is a given and this is basically everyone who studied this in other fields came to that conclusion it's just because we've been in our little niche here optimizing as hard as we could for five years that we got to the place we are. So even if you're not gonna put a full blown reinforcement learning system over your entire architecture just adding some random exploration is gonna make things a lot better. The paper I referenced from DeepMind from a couple of weeks ago that's basically their conclusion. Just put some random exploration on top of any system and you're already gonna be in a much better place. There's a reason for that too. Overall people are better informed if they read a more diverse source list so if you're optimizing even then you're just gonna have dumber people and if their tastes evolve then you're just gonna end up with a bunch of stupid people feeding stupid systems. The other point is that you don't necessarily need to reinvent the wheel. Let's say you have a system where you're collecting blood sample data and you're building some sort of algorithm to predict if a person has tuberculosis or some other disease. The blood is probably not conspiring to have you predict leptospirosis instead of tuberculosis. It doesn't care. The blood's not gonna evolve. It's not gonna react to you. So all you have to think about at this point is basically rare disease in the small data case like we had before and just try to get enough data to have confident predictions on every category and not try to predict a common disease over a rare disease when you don't have a lot of data on a rare disease that turned out not to be diagnosed ever. But you don't need to go crazy if you don't have humans in the loop that are just doing their own things on top of it, on top of you. The last thing I wanna talk about is that you should pay close attention to the target variables. So recently Facebook said that they'd change their giant, basically the KPI that ordered the Facebook feed from engagement, the meaningful engagement, basically meant that they refactored it towards something where it's more important if you comment with people closer to you than just random people. And I think they hoped that this would turn down the political shouting at each other. But at the end of the day, if you have a KPI like this, what you're doing is that you're folding the entirety of human experience into one number. And this turns out to have a lot of hiccups if you're not doing it carefully, right? If you take the entirety of your human experience of using Facebook and you're tuning it to exactly just how often you comment, like and share, then you're skipping over a lot of things. It's like if you're taking three dimensions here and you're folding it to two dimensions, then you're gonna have artifacts. In reinforcement learning, when you're specifying exactly the target. So this is a boat and it's being trained by a reinforcement learning algorithm to win the race and make as much points as possible. But it found that it's easier to just spin in circles and hit those three boxes right there and make as many points as possible, as fast as possible, even though it loses the race. This is just a little artifact of how they specified the reward function for that boat and the algorithm that drives it. And this turns out to be the optimal solution for this thing. So you should watch out about how you specify your objective functions. In solution to this, so for instance, going back to economics, people measured countries and basically they played the who's got a bigger one contest with countries for a long time with GDP. But that's not really a good representation of the actual quality of life in countries. So over time people have moved to other things like a social progress index or the human development index which measures a couple of things and then melds them together into one number which is a better way of looking at things because for instance, Saudi Arabia has a bunch of money but living there if you're a woman is probably not the most fun thing. Similarly, so it's a better stack ranking if you're trying to fold a bunch of dimensions into one to actually take parts of every single dimension. If you wanna look at user engagement or time spent on Facebook then you might wanna look at 50 different things and fold it into one number and optimize over this instead of one simple number. The other point is that if you're measuring things over the short term then you're doing it wrong. This is a study from Pandora which is a pale competitor to Spotify I guess and they measured what happens when you take just a bit longer time to load ads. And so the vertical variable is the number of unique listeners per treatment group and what happens is that over time you get more and more annoyed by ads being worse and if you measure basically the difference here your boss is probably gonna tell you keep getting ads because ads are a trade off between money and pissing off your users. If you measure over there then the trade off is completely different. Similarly, whenever you're looking at conversion rates or retention rates all of those things are always defined in terms of time. So if you're looking at how do you keep your users looking after one week or three days is different than looking after six months and this is important. So just to close out I've been going pretty fast through all of this. The financial crisis in 2008 this is an exercise in short term KPIs for people who didn't know this is completely unrelated to Python but I think it's a fun fact. The CEO of Bear Stearns had 90 million dollars in bonuses and withdrew 300 million dollars from his equity in the company before the company went bankrupt. Interestingly, the year before he seemed to have knew because he withdrew a lot of money right before Bear Stearn crashed. The question that happens there is that if you're the CEO of Bear Stearns you're paid mostly in bonuses and in equity that you can sell then you don't really care if the company goes bankrupt because your bonuses are year to year. So all you want to do is pump the stock of the company year to year and if you get canned and you're not going to jail as we saw anyways then you're basically incentivized to do what they did coming up to 2008. So having these sort of short term KPIs where you're not incentivizing your management to think long term for the company can lead the pump and dump schemes that lead the financial crises. So what did we learn today? Recommendation systems their problem are not gonna get better because we're not doing the correct thing. Even without a full blown crazy reinforcement learning model just going to your boss and saying we should spend a bit of our optimization time to actually have exploration over the entire dataset is gonna make things better. And if your boss comes to you and tells you that we should look at short term KPIs because we want an IPO soon. Maybe if you want an IPO soon but in most cases you probably want to think about longer term and composite KPIs that look at several dimensions at the same time. That's it. Thank you.