 Yn y gyfnod rhaglen gennym, ond y gyfnod rhaglen gennym wedi cael ei ei ddweud â'r Ffondi Llywodraeth Llywodraeth yn y ddadw i'r ysgol yn ddiweddol. Felly, when the first generation of young broken-hearted companies started, they ran a competition to try and encourage use of cloud infrastructure and we offered a prize of certain amounts of infrastructure to be used by the winner for 12 months. Our next speaker is that winner, Devyn Gaffney, who is a research assistant of Masters Candidate at Oxford Institute. Oxford Internet Institute and Devon is going to be talking about the work he's been doing under that, well I think under that prize winning anyway. So ditching the dowsing rod, why prescribed analytics fail to deliver? Well I was going to do the introduction but I guess that's all done with. So the best way that I can explain how I came up with how to explain this whole work I've been doing can be done with the following situation that transpired a few months ago. This is my friend Hugh and on February 15th he wrote this tweet that said, I got you this giant teddy bear wearing a vampire costume. And then a few days later the data scrubbers over at clout had deemed that he was in fact influential in vampires. And I think that you know we can use those sort of common sense to say like the context of this tweet, this picture of this man like probably reasonable to say that he is not influential in vampires. And so I started thinking about this situation and like thinking about what sort of factors lead to something that's so hilariously wrong put out by some company that is so hilariously rich. So in order to answer that question you have to kind of go back to the original Web 2.0 model. As you all know the basic idea is we create lots of data online and have users put on all this information. And this is all you know coming about as a result of technological, economic and infrastructural conditions being right at the 2000s. It's nothing special about the 2000s internet. It's just that's when it happened. And the model basically says let users create a lot of stuff and let users check out all the other stuff that people created and show advertisements on the margins of all that content and make your money that way. But then we have this sort of derivative model which realizes advertising is really bad at what it does online and you know building content for people is very bad at what it does online. So we start looking at well what have these people been doing over the last decade. Can we use that to sort of train our system to create better advertising or better content targeting or whatever. And so what you get are these sort of top down easy models like web trends or people browser or cloud or social flow. Which basically say I have a model that describes activity on the internet and can help you target all of your content better than any other model that exists. And I can't tell you how it works but I can give you a two digit number that will tell you how you're doing compared to other people. And it's not how that's not how that's not how research is done. You don't do research like that you you have a question and then you consult a literature you look for a methodological practice that that you know sort of jives with the question that you have. And then you get your data and then you ask your sort of home bird question to that data. In fact if you look at all the research that the sort of social science and internet research people do. All the big papers usually have some completely new sort of analysis that they do to a sort of novel data set collection. That isn't sort of just a you know reiterating the same model over and over you have slightly different variations. And if you're not in the research community you're basically just left with those tools. And if you have a question that even kind of doesn't match clouds system you you you don't have an answer. And so I've been dealing with this problem specifically on Twitter for about three years now which is kind of weird. But what I've come to is that there's a few basic problems here specifically in the case of Twitter when we were building our original system. We were kind of taking the cloud route of kind of building a few analytics allowing people to put in data to our system and then we would deliver data and analytics as a result. But then Twitter came down on us and said that you actually can't give raw data to people and due to Twitter's API limitations you can't collect data retrospectively. So we can't go back and ask for data that occurred a week ago or something. It's completely impossible to do easily. So what what we had was these two sort of constraints at that point. We can't look at data in the past and we can't share raw data. So we have to come up with a system so that we have some way of covering most of the content that comes through Twitter. That's interesting to the internet research community and also coming up with a system that allows us to ask questions in a very flexible way. And so what we've come up with is a system where you have lots of researchers joining in on a single site that come from many different practices and many different backgrounds that have sort of social science research questions around Twitter. There's actually a lot of people that do this and then you have a system that makes it very easy to add new analytical questions and a very flexible system to add new questions into the system. And you keep all the data that's been collected over time so that in a year if there's a new analytical question that we can ask to an existing data set we can go back and ask that. So this is the view for one of our pages which is the system that creates new analytics. You basically type in this information, type in what the function name is, what language it's supposed to run in and then you add some sort of metadata for users to be instructed as to what it does. And at that point you're able to add variables dynamically and add dependencies or sort of in order for this analytic to run. Another analytic must run before it so that you can chain together multiple processes. And then once that's all done the user is faced with this page that basically lets them append the analytic to the data set. And any researcher can request any analytic to be done on a data set regardless of if that person, that researcher, the owner of that data requests it so that the researcher that collected this data can kind of walk away if they don't find anything interesting about this data and other people can come and pick it up. And then at that point we have a basic view page that you can write that's completely extracted from the system that we run internally that lets the developer of this analytic define how the results are going to be shown. So this is again just this sort of basic user statistics one and this is some data that was collected the other week on the French election. And so you have this sort of very transparent system that shows you exactly what the numbers are and the code is all open source and available so that if you have any questions about the system you don't have to trust a secret sauce algorithm or something somewhere else on some other website. You can actually just look at it and you know run a simulation yourself and all the code itself is completely usable for someone on their own local machine. And so this is all sort of in the effort to deal with this you know this sort of problem of the secret sauce approach. And then since we're running on this cloud system now we're moving towards it's kind of difficult because I'm not an amazing programmer but we're moving towards being able to have a very scalable system so that when we're not having a lot of activity we don't use any resources because Twitter is a very sort of temporally constrained system sometimes you have an immense amount of data coming through and most of the time you don't have really anything interesting going on. So what's the most interesting thing to me about all this though is that this basic idea of having two different types of users on a site. One is the sort of researchers who have questions and one of the other is the sort of developers who can answer the questions that researchers have. Coming together into a sort of platform that doesn't do analytics but sort of facilitates the process of analyzing data or analyzing data from the web is generalizable. You don't have to do this for Twitter you can do it for any of the other sort of sites that the sort of social science researchers are looking at these days or you can do it for the web in general. You can start building systems that just crawl the web and you have you know these two groups of people one that wants answers to questions and one that can answer those questions. And so we start stepping away from that sort of short term benefit of giving this sort of satisfying two digit number to you that makes you feel better but doesn't tell you anything about what's going on and doesn't have any sort of knowledge of the context you're operating in and can actually start moving towards generating automatic reports about the type of things that we're interested in in a way that you know the researcher actually has agency in the process instead of basically just being passively accepting of any system that gives them any form of data which is what we currently have so thank you.