 Welcome to this fourth robustly beneficial podcast that today we're going to discuss the paper by Google that was published in 2014-15, I'm not sure exactly of the date. And it's called Focusing on the Long Term, It's Good for Users and Business, which is a very interesting paper, I think it's quite unique in its genre because it tackles the primer of long term and the impact of some modification of the algorithm on the users in the long term. Which if you think about this is very rare, like usually what we're doing when we're doing testing algorithms is always like very short term, like most of the time it's A-B testing. So the basic idea of A-B testing is that when you go on a website, the website is going to randomly choose to show you a version A or version B of the website. And if people engage more with the version A than the version B, eventually the website learns, well it's very basic learning, but it learns that one of them is more engaging and it uses for instance the one that has the most engagement out of the two. But these tests are very short term because they just see the immediate reaction of people when they are exposed to this kind of content. Exactly, so in the paper they describe the whole methodology on how they made this change, they made this study. So the basic idea is to have two conditions, one that's called control, that is the normal condition in which the software works and one experimental condition, which is a change for example the experimental condition can be increasing the number of ads by 25% that appear in Google search. And an interesting point was that they see right away that over the short term, just over a few days, increasing the number of ads increases also the revenue. But for these users, for which we increase the number of ads, after one, two, three months, we also see that the user changed and they start to click less and less on the ads. So over the long term, if we count the effect over every day, we see that in the end the revenue decreased. Yeah, that's very interesting. And one thing that was striking to me in this paper, maybe we can discuss more of the results before going to other things. Another thing about the result is the study of how long does it take for people to get used to this kind of new ad load for instance. And so if there are more ads people eventually learn to ignore more of them. But how long does it take to learn? And actually, I did not have the chance to be surprised because I learned about this paper in a talk and the guy just said, well look, it takes that much time for people to learn. But I think it's an interesting question maybe for people for listening. How long do you think it takes for people to learn, to stop, to click less on ads whenever you have this change of algorithm after which there are many more ads? So the result in the paper was that after three months, they expect to have measured only 65% of the learning effect. So we can think that after six months in total that people would have reached 95% of how much they learn to avoid the ads. And that's a surprising amount of time. If you think about it, whenever I see a website with too many ads I stop clicking on them or I stop going on the website maybe quite quickly. And I feel like this learning process should be fast, if I had to guess. So maybe you have this impression because you compare things that are very different but here they make the measurement of a small change in Google search results. I don't know how much they increase the ads in the experiment but I think it's not from 1 out of 10 ads but from an average of 3 to average of 3.5 ads. I guess in this case it's harder to learn. But then it means that you learn subconsciously, right? Exactly, a lot of this kind of thing is unconscious. And it's quite amazing that the learning rate of the unconscious mind is very slow. It takes months to change the behavior because of the change of what we're exposed to. And that's interesting. And it raises all sorts of other questions like what are the impacts of a change of the algorithm that recommends less conspiracy videos on some user? And probably if you do A-B testing you would not see that much maybe? Yeah, I don't know what you would see. You would see that if you increase how many conspiracy videos are in the feed then people would click more on it. But what we could observe over the long term is that if you sustain this increased amount of conspiracy video then maybe what could happen in the future is that users get more critical and decide to detect this kind of video and click less on it or what could also happen which would be a lot worse is that users get to believe more and more in these kind of videos because they also have been studied where we know that if you just repeat something to someone a large number of times they would start to believe it's true even though they have no reason to. So maybe over the long term high increasing the number of conspiracy videos would make people believe in it more and engage with it more and more. Yeah, it's hard to study. One thing that was interesting in the paper is that they proposed new methodologies to study these long term effects. And maybe before getting into this, it's interesting to know that it's hard. We don't have a lot of studies in psychology out there to the best of my knowledge that study these sort of long term impacts of repeated exposure even though this paper shows that there's such an impactful thing going on after weeks and weeks and even months and months. And so the methodology was not like... I wouldn't say it was groundbreaking but it's still interesting. So the way they did this is... Essentially because the problem is that you would want to... The basic approach where you have this user and you give him a treatment like basically showing him more ads for instance for a long period and then after that you need to compare this with the control group and you need to compare this on the same kind of exposure to the same thing. So the idea is that after this you revert back to the old algorithm, the old ad load and you compare on this same ad load, small ad load whether the user who has gone through this treatment period for weeks is going to click more ads than someone in the control group and I guess this is more like the classical scientific method like the statistical test applied to this setting but it's actually very inefficient because you have to choose ahead of time the period of time you want to study and you don't know how long it will be and then if you want to test more you need to redo the experiment with other people. And so the other thing that they proposed was rather to do this comparison between someone who has this change of algorithm for a long period of time compared to someone randomly selected from the control group who are exposed to increased ad load on a given day only and you compare this and this allows you to give you a curve as a function of time to have this comparison over time which is nice. The way they do it is every day of the experiment they would take someone that has not been in the experimental condition and just for this they put them in the experimental condition and compare this with people that have been in the experimental condition since the beginning of the experiment. So like this the short-term effect is the same for the group used as control and the group used as experimental because they are both today affected by the experimental condition but like this they can measure how much the people from the experimental condition have learned since the beginning of the experiment. Yeah, do you think there's a better way to do this? I've tried to think of it a bit but I haven't found a better way to do this it sounds like a reasonable methodology. Yeah, I think so because the problem as you said that either waiting for the end of the experiment to be able to make a measure it's a very cumbersome or during the experiment if we only have experimental group and control group the comparison is meaningless because we are comparing two different things so the measure we get out of it is not interesting. Yeah, one reason why they said that it was better to do this one-day experiment every day you could think that I'm just going to do this on the first day and that's the baseline. I always compare how much people would click on the first day when they were exposed to more ads. But by doing this repeatedly they can control for compounding variables maybe because of some other sites that people started using they got to learn on this other website to use less ads, to click less on ads so that's the way to control this. Yeah, exactly. Any real life event outside of the platform can change a lot for a few specific days how much people engage with the ads. I guess if there is a political campaign people would be more likely to click on ads because people are talking about it at the moment. So yeah, this is why it was important for them to compare every single day with another group on this same day. Yeah, I guess it does not completely remove the compounding variables because maybe they are, as you said, maybe you get into a political campaign and there are like the ads are more engaging or people click more on ads and this increases maybe the variance of how people click and maybe the difference between the control group and the experimental group gets increased because of this. Yeah, so they agree. The measurements are very noisy because they are also able to do it over a large number of people. Yeah, the impacts are quite high. Much larger than in the paper we talked about last week. It was, I don't remember the figures but it was like double digits like it was 18% something like this. So yeah, 18% for Google in terms of ad clicks. It's a lot of money when you think about it. It's like billions of dollars. So a huge stake. And if you want to do good, yeah, I think that it raises a lot of questions. It shows also really the impacts again of the algorithms. Here you have really huge impacts on how people behave on a daily basis like how much they are going to click on ads which is not what we perhaps care the most about here on this podcast. But it's interesting that you see that just these changes are very simple changes like you just add more or less ads. You change the people's behavior quite a lot. So yeah, algorithms matter. They have a strong impact. Yeah, so as we discussed yesterday that the topic of ads is not extremely important and interesting. But what was really interesting in that paper is that it shows that users learn from the platforms. And this platform has an effect that can accumulate over a long time and change users in very significant ways. So certainly what we are exposed to everyday with this platform such as polarization or angry videos they certainly have an impact on us and that make us learn and change our behavior. So this is I think quite a worrisome and more studies on this kind of thing would be extremely beneficial. Another striking thing about the paper is that the experiments started I think it was in 2007 something like this and it was published in 2015 or something like this. So the paper is the result of nearly maybe not a decade but like several years or like say at least five years of studies. And this shows that it takes a lot of time to gather this kind of data that are I think really interesting. So if you want to do robustly beneficial algorithms I think it's critical to understand this interaction between algorithms and humans and it takes a lot of time. So like we should start now. We don't want to wait for the next Cambridge Analytica++ because I think there's a lot of things going on on the social media that we really don't understand and it takes years to understand them. It takes also a lot of resources like you had to be Google to do this essentially. Yeah exactly. This kind of study will very difficultly come out of academic research because usually academic research is run by PhD students that do PhD for at most four or five, six years and you don't do a five-year study if you do a five-year study. Yeah, yeah. Don't start your PhD by saying oh I'm going to run a five-year experiment. You're not going to graduate. So there is this kind of experiment that can be run but simply analyzing the data from the past even though there were no experimental conditions can it give us as much information as this kind of experiment? Yes, I think there's a term for this. I think it was called natural experiment or something like this. When you have this population and for some reason one part of the population was exposed to this kind of content and it was a bit arbitrary like it could have been the other. In such a case like you have a natural setting to do sort of a comparison between treatment and a control group but this only occurs like usually there are confounding variables like the reason why this group of people were more exposed to this kind of treatment was because of some other reasons and maybe this other reason was the reason why the treatment worked or something like this. Yeah, so we believe that the data so far from the past on this platform is nearly useless to answer very deep research questions? No, I don't think it's useless at all. I just think it's much harder to analyze but it's much more plentiful. There are many more data like this so you can leverage maybe this amount of data but if you ought to do this, you need to do it well and basically the classical approach of test hypothesis seems very obsolete to me for this kind of thing. To say these things that go on for years like this where you have plenty of confounding variables I think given that there are lots of data and you want to explore these data to understand what's going on I would say that it would be worthwhile to have more embracing more global models that include different aspects like the fact that there was this president that got elected at this moment and things like this and it's going to be tricky it's going to be hard and it's going to be not fully reliable but maybe you can still leverage this by huge amounts of data to have a model that still makes predictions that are somewhat informative but yeah, I think it's very hard. Yeah, I think it can have still a lot of value if you study population that are somehow homogenous and you see that between them some got exposed somehow randomly to such type of content and see how their behavior changed I don't know if you start watching a first video about football then there's a chance you will start to like football a lot change little by little to watch a lot more videos about football in the future I think this data even though it would be subject to confounding variables it could still give us a lot of insight Yeah, I think it's extremely challenging to because I would encourage this kind of research I think there's not enough of this kind of research like analyzing the past data of social networks for instance but the study has to be done extremely well which is very challenging and you need a lot of variable people to do this and also like you need a venue to publish your paper and I'm not sure it's going to be that easy to publish this kind of long term quantitative studies of something with some very complex model I don't know if it's a field of research that exists today I don't think so Yeah, like it'd be hard for me to advise a PhD student to work on this I feel like it would be a curious suicide Yeah, okay If you could do an experiment on let's say YouTube so decide to change something in the recommended system for half of the world and observe the effect of a time, what kind of things would you do? So I've gone more and more interested in the concept of intellectual honesty like Julia Gelef has a talk about this on YouTube and so it's basically the idea that you should try to not lie to yourself and if you think about it we're lying to ourselves a lot arguably all the time like we want to believe something and so we discard this thing and we say oh no it's fine and we tell it to ourselves and because of this we have this confirmation bias going on and all sorts of things like this and the example I gave once is that when I was doing math popularization a lot well I'm still doing this a lot but I convinced myself that mathematics was extremely important like then mathematics research was underfunded now I'm not even sure about this anymore I don't want to make mathematicians angry if they watch this video but I think there are good concerns like the AI ethics but back then I think I was really lying to myself because I was doing this math outreach on YouTube and blogs I felt like I had the duty to defend mathematics and I was kind of lying to myself mathematics is extremely important that's why I'm doing this and that's why you should also read me and watch my videos and fund me if you can and I think intellectual honesty is really really important but I would like to have a better understanding of how different videos impact people's ability to be intellectually honest and I fear that a lot of science popularization videos including mine and that's very annoying for me but I'm trying to be integrity honest I think a lot of videos may be counterproductive in this regard it's like comforting people's belief that science is important and is the way to go and what science says is true and it increases some overconfidence so I would be really curious to see the impact to try to randomize the impact of different videos on people's intellectual honesty because I think this is really like something if you can make progress on it's going to be much easier after that to have more robustly beneficial proposals for algorithms or for societies but this is a very difficult like we talked about yesterday about the fact that you're doing your PhD in education and it's hard to study long-term effects in your field as well even though it's critical so a lot of the experiments in psychology these days are like one shot like you get this guy for two hours and you can do not everything you want on the guy but you can do things on the guy for two hours but then it's over you cannot touch the guy saying we have things better and as a result you only study short-term effects then you write books and you make proposals based on this study of short-term effects and the long-term effects may be neglected one aspect of the paper nevertheless they build this model to anticipate the revenue of Google based on number of users times how many has we showed which others times how many users, how much users click on ads times how much money we get from each ad so what was interesting is that they had all these factors that is more than just how many clicks on ad do we get a short-term study would simply change the experimental condition and measure how much clicks on ad do we get but this one it measured over time these several aspects so what's interesting what changed that the click-through rate how much user clicks on ads it changed over time right at the beginning of the experimental study it was some values that was increased because more ads were shown but then it decreased over time they see this as a more complex system something else that could happen is increasing the ads would decrease the number of users because users would do it in the paper they talk about the fact that they they try to anticipate what would be the long-term effect based on the short-term effect and this can be valuable I mean apparently it has been valuable for how they run experiment inside Google to correctly maximize their revenue over time not correctly but better maximize their revenue over time by now they sort of built a kind of indicator that you can test over a short period of time and will help you predict what will be the expected revenue for a longer period of time yeah so they had this basic model where as you get exposed to something you change your behavior and then it asymptotes like this and the typical time length of variation is about two months two or three months and so this means that you can apply this for instance to last week's paper last week's paper like we said that what the paper said that if you change the algorithm of Facebook news feed you can change people's what people like how happy the posts of people who are going to be afterwards but the impact was very small was the order of the 1% or 2% and I did some rough calculations if you assume that how people exchange their mood essentially is similar to how they change their clicking patterns on ads then you should multiply essentially like roughly by 12 something like this it's also very rough but yes by 12 maybe more and this makes the impact very important like you increase happiness by 12 or 20% on billions of users that's huge that's huge and also like you can have other side effects like there are studies that show when people feel more lonely they feel they become also more aggressive so if people start to post more happier posts then there can be a feedback loop more people get used to see even more positive content and maybe there are some feedback loops that are not even captured by this toy model but then there are also limitations to these predictions of the long term given short term data like maybe there are some features of human cognition that aren't monotonic like maybe at some point you get too much exposed to this kind of thing and it's negative maybe you have more complex behaviors but we lack data to understand really what's... there are even some things that would be extremely difficult to measure within an experiment so for example in the case of Facebook if we do the same experiment and you talk about these feedback loops that the more people post positive things the more they will see positive things but to really change... to really activate this feedback loop you need the whole Facebook user base or at least the... all the network of... all of one huge network of friends that are all chatting together and see the message of each other to be connected to really activate this feedback loop as long as you keep it an experiment with just 10% of Facebook users you will not see this kind of thing I was interviewed by some other person yesterday and he asked me... like if I get to choose for the YouTube algorithm whether it's going to... like I could not change it so that it would instantly promote a lot more ecology videos would I do this? would I click on the button to do this? and I answered like... it seems beneficial but I would be... like I need more data to be really sure and the typical thing that I am afraid of is that if you just increase this or some of the ecology videos there are videos like that say that we should protect the environment and so on but they are not really scientific and they may propagate fake news or ideas that are actually counterproductive to protecting the environment I think it's actually a very complicated topic and there are lots of things that... well, like we gave the example of nuclear energy and a lot of ecological videos like for instance Greenpeace videos or stuff like this we'll say that... that nuclear energy is having a negative impact on the environment for instance and some people understand that it has... that this is because it has some carbon emission and that's wrong so that's... it's possible that just by clicking on this button you're promoting the wrong kind of videos and it's also possible that by clicking on this video you also create the wrong kind of reactions so there's this other paper probably we'll discuss it at some point here on the podcast that shows that if you increase people's exposure to alternative views so the setting exactly was like you were on Twitter and you pay people to follow one Twitter account that retweets the tweets from the opposite side if you're a Democrat it shows you tweets from Trump I don't know who and if you're a Republican it shows you Clinton or whatever and they actually showed that this increased polarization Well, they expected it to decrease and that's a very... like I've actually heard a lot of people saying that we need to increase diversity in recommendations of Twitter and YouTube and so on but actually if you just do this it's probably not going to be robustly beneficial and actually maybe like these studies show that at least for some people it's going to increase polarization and so you really need to understand better these kinds of social medias and the psychology of people if you want to make robustly beneficial interventions and that's why these kinds of papers I think are really important and really critical I'd say you need more of these kinds of papers I'd say Showing one ecological video might teach something about the ecology but we should look beyond that how will user change their behavior and what will they do afterwards Yeah, yeah Yeah and it's also possible like if you want to do something that's really robustly beneficial I think what you recommend really needs to depend on the user like if you want to get people excited about mathematics like I love 311 videos and like I would want the YouTube recommender system to just suggest 311 videos to me but maybe it's not the right kind of video for like a 12 year old kid maybe it's like he's going to scare him because it's a bit technical and same thing for ecology like maybe there are some ideas of ecology that are scientifically correct and there are ways of ways to go but like the user is not there yet and he needs to process other information before getting to this kind of content so like personalization that's often the it's critical if you want to make robustly beneficial recommendations yeah and unfortunately all of this is very hard to study and to analyze them so what can be done about what kind of research can be done to better understand these effects yeah so like the classical approach for studying long term effects in healthcare is epidemiology where you just track people and you ask through questions that you ask them through surveys you ask them what they hate and stuff like this but this is but I guess a survey about well I guess you could do epidemiology of how people think I don't know how effective it would be how reliable it would be trying to understand why people change their minds about this or that trying to think of research ideas that could be relevant to better understand how to make robustly beneficial algorithms yeah clearly it's not easy maybe some psychologists that watch these videos will have better ideas than we can so I hope you've enjoyed this podcast I think it's a lot of food for thoughts because it's quite different from the way people think about algorithm testing for instance like we really think of algorithm testing as you run this test and it's done but this kind of paper shows that well tests are limited they have their flaws as well and yeah I think we need a lot more research to understand this I hope you found this interesting it's a lot of food for thoughts and I hope you will see you next time next time we'll discuss the introduction of PhD thesis that was written here at EPFL by Luka Mestre called efficient learning through comparisons or from comparisons which is also a very important thing in terms of AIFX because if you know what people really desire well it's easier to know what's robustly beneficial to do see you next time