 This element of the program is framed as keynote dialogue and that's why I want to ask you the first question whether you have questions regarding each other's talks or some remarked questions you have there. One of the criticisms made of various kinds of big data is that there are problems if you rely upon just Google uses because obviously most many people don't use Google and what value you think there isn't just using in your case Google rather than all the different search engines and you mesh them together and whether that's an issue which we need to think about. Interesting question, I think at this point we haven't really thought about combining data sets from different search engines or providers and I think that at least from my perspective because I can still speak in my previous role at Twitter, the data sets for example that Twitter provides and that Google provides are a bit different because they are based on different user behaviors. So for example Twitter users express their opinion, express their emotions on Twitter. Users that go to Google to search are following a much more honest way to express their thinking because I would never go or log into my Twitter account and ask like who's Donald Trump. I would never do that because I would embarrass myself in front of my followers or friends on Facebook but I do that on Google and so the data sets that any company provides or can provide for journalists and academics are definitely different. It would be interesting to actually look at those different data sets and combine them and I think there is an opportunity especially around elections to see and to double check if there is a similarity or correlation. You have a question as well? Yes, I do have a question. So you mentioned that there is a shift in not only sociology, it's a difficult word. In academics, towards data driven research, how difficult do you find it to find the right data sets and what is there that you would wish? I think that is one of the key questions for social scientists who do want to use these kinds of new data sources and the answer is not straightforward. I'll give you an example of a project which I was very keen to becoming involved with which is what in my interest is inequality but also how inequality affects consumption and cultural activity and lifestyles and I've done various surveys on this. Of course surveys only a few thousand people and they give you much detail. We do know that in the British context one of the most interesting data sets you can have on this is a data set which is assembled by our supermarket Tesco with their loyalty card which records every single purchase. It's a very interesting story that Tesco became the dominant retail supermarket in Britain because they were the first to do this and develop this database which they then contracted to a market research company which they actually owned. This allowed them to have real world analysis of what was selling their shops. This is 20 years ago they were the first supermarket to be able to look at trends immediately. So it helped them to plan. Academics would love to have that data set and love to be able to say can we use this to look at social classes, certain classes like in certain kinds of products. Getting access to that data has been proved extremely difficult. Tesco even funded a research center at University of Manchester and there was talk about yes we might give you a bit of access but it was approved really, really hard. I think there was an issue really about private sector companies and I also think there are issues of ethics because I think most people who have a Tesco card don't really realize. I mean they obviously tick the box when they've got the card but they don't really realize this data is being used. I think in a way we should make a public claim that actually this material should be more in the public domain. Clearly not necessarily in a commercial sensitive way. So there might be a time bar on it but I think it is one level shocking that we are seeing huge amounts of data generated which could actually be used for all sorts of really important questions. Google is a bit different obviously because you're not operating like a supermarket but certainly in the case of Tesco it is a great shame at one level. This data is kind of lying on the shelf. If I can ask another question. Let me pursue that example of the Tesco card and what you could do with it. Because one of the criticisms made of that kind of work. I mean we never got access to the database but if we think through the principles and if we think through some examples of something similar. Some critics of big data would say well what you can do on that is you can kind of infer certain kinds of social characteristics but you don't really know much about the individuals. Because the Tesco card has your address. They may have your age but that's all. They don't know anything about your job directly. So for instance to go back to your Google you can't tell that doctors do these kinds of searches and lawyer drivers do other kinds of searches. And one of the arguments would be that limits your kind of analytical power. Of course the data that you can have access to at Google at the Google Trends site for example is anonymous data. And I think of course that's why it would be interesting to combine data sets. And if other companies like Tesco and your example would provide that data in an anonymous way. And combining that data would definitely give a much broader and much bigger and more concise picture or accurate picture. The question would be also from my side again. Do you think that in order to change the perception on big data from a scientific academic point of view. There has to be there have to be more examples on how impactful research based on big data can be or yeah. Yeah I do think I think that's how social science moves on is to see researchers using a new method or a new kind of data source to say something fundamentally new. As I was saying Piketty is I think he is now an exemplar. But he and he is using taxation data which is a particular kind of public if you like big data. So we haven't really seen an example I don't think from the more commercial world in terms of the access issues which we've been talking about. And that would be that would be interesting. I mean I'm very struck by you know we can see these fantastic visuals of people carrying their mobile phones around Manhattan and people moving around the city. And it is beautiful to watch but you can't think well and what we actually learn that people move around the city and they have these pathways from their home to their workplace. But I think we still struggle a bit to find a good analytical way of knowing how to use that for good social scientific purposes. I mean I'm very keen to do that but I think we still need to do more work. Okay. One comment maybe. It's good to see that's running on its own. We don't want to exclude you. No just ask and then I have a question for Mike as well. I just wanted to make one comment on your point and this just was launched today. It's been a really interesting project from the Berliner Morgenpost who got access to I don't know the exact data set that they got access to but it was a data set by the I think department of city development or urbanization and they created really beautiful maps on how Berlin changed over the past years since 1990. And you don't have to be a Berliner or you don't have to know Berlin quite well but what you could see is based on data and based on data that was made available. How the city changes and how different suburbs changed over time tremendously. And of course if you look at the visualization and you don't read the text below or on top of it you might think well beautiful map that's nice but I don't really understand the context of it. So in that way it was made in a really really nice way where journalists and data specialists work together. So and that was the kind of data that also for academics would be interesting. Great and I see that both of you see a lot of potential and doing analysis on big data. But I have some question marks still I must confess and one is coming back to the presentation of you Mike and at the beginning I had the impression and you use the term I think that it is a tool. But later on there was this bullet point saying that it is more important to focus on these sense that lies in the data itself than the theoretical framing the philosophical narratives and things like that. Isn't that a contradiction don't we need still the theoretical approach and things like that to come to see the question and that is the question we address to the data set. Is there a contradiction in there? I'm not sure if it's a contradiction or a certain different emphasis. I think you're absolutely right I think and I think data doesn't speak for itself that's the crucial point and the more data is the less it speaks for itself because the complexity of it becomes so overwhelming. So you do need theory absolutely I think but it's how you make the theoretical points and how you establish your theoretical argument and so I think the way that Pickett does it or Workings and Pickett do it are by assembling these visuals. So your theory isn't proceeding by reference back to philosophical a priori it's being done in a more inductive way in dialogue with the data. Isn't there a risk in there because maybe the data set is now dictating what academics find interesting. The data set I see when I apply this kind of analysis to this data set that comes a beautiful you curve and that's why I draft this article not because I see a phenomenon that I don't understand or things like that. You're absolutely right I think I think there is that danger and it's become a new empiricism but and I think you might say I think of those three studies I talked about the most criticised is Workings and Pickett. Epidemiologists many of them feel they've misrepresented data sets and they've essentially they have a very strong argument they want to make and they just deliberately trying to manipulate some of the evidence to support it. And I think the thing about Workings and Pickett is out of the three thinkers I talked about they are the ones who are least kind of critical of data itself. Whereas Pickett for instance and obviously absolutely he's the case that he would be his arguments are dependent upon the data sources and even income tax data is clearly not perfect because many people declare tax. But the interesting thing about him and I talked to him recently when he visited London is he's very aware that data is constructed. And so at one level he keeps a step back from data and is always asking the question well you know why do these things get measured in this way and not that way. So he is using data to make his arguments but he's using it in a way which he's also standing back from it and that also allows him to recognise I think some of the absences which might be. And I think also to that point I think it's two ways. So you could have a really compelling data set and then you dive deeper into it and you find a really interesting story that you can then elaborate on and write on. And sometimes it can be the case that you have a fantastic idea and a fantastic story in your mind that you want to base on data and you find data sets that are not working well with that story. So I think that's it's not always based only on the on the data set and you start with the data set is two ways. Okay I see. Coming to the data sets you are interested in and the news lab idea. We all know that at least in Germany there's not only friendship between publishers, journalists and Google to put it mildly. How is your experience as regards cooperation with journalists and publishers? Do they really want this tool? Do they make use of it? Is there a real dialogue or is it more like a selling process? What is it like? I mean I've been with Google for five weeks. So starting with that. And that said long before I joined there was a strong relationship and of course open dialogue between Google and publishers. And in April we launched this digital news initiative where eight publishers from around Europe and Google work together to really drive innovation and think about for example products that can be supportive for the entire media landscape in Europe. So the digital news initiative is exactly that step to deepen the relationships and to deepen the talks with publishers. This group of publishers and together with Google they even launched already a product that is available. It's called accelerated mobile pages. It's an open source product aiming to increase the speed of web pages and mobile pages. So it's not only about Google tools or Google products. This is really open to everyone. And then secondly there's this innovation fund where Google and together with publishers want to drive innovation in the start-up ecosystem. And then thirdly there's a news lab where we come in and we of course want to collaborate with journalists and give them access to the tools, give them access to Google Trends data. And the real reactions so far in the five weeks I've been here were very positive and they're interested in learning more about tools, trends, etc. Are you open to collect the ideas and put it into new products and analytic tools or is it just that you give access to tools that you have designed and that are available? So the news lab, at least I can speak for the news lab, the news lab started with really a listening tour, spending a lot of time in newsrooms and asking journalists what do you want, what do you need to do your day-to-day job. And so it's not only about collecting ideas and then nothing happens. Even tomorrow, so it's about really working together and collaborating. Even tomorrow we have around 200 publishers from around Europe in Berlin to discuss the challenges of media and also the solutions that we can work on together. I come back to this narrative of this big data analysis being a tool. And a tool of course is a marvelous thing. You can do things with tools you can't do with your bare hands but tools come with affordances. If you only have a hammer then you're inclined to do things with nails to take this example. And doesn't that mean, we talk about power shifts here, that those who own the data and create the analytic frames get a little bit of power over the science in your case and the public communication in your example. Is that something that is discussed with Google and others and in the academic sphere? You're absolutely right. I think that's inherent power bias towards those who are most engaged with these kinds of sources will mean that there's a danger that certain kinds of people just are outside the frame of reference. And in a sense the more powerful you are the more likely you are to be part of this world of big data. And I think you see that very much in a way with the world of Twitter. People who are tweeting a lot are more visible. I was involved actually, I didn't talk about it today but just this week in Britain we published a book. We were involved in a project with the BBC called the Great Village Class Survey which was a big web survey of people's attitudes towards social class. It was a lot of public interest, lots and lots of discussion. Interestingly because it was a form of big data it was a digital web survey and a very large response rate. It wouldn't have been as much interest if it had been a normal survey but this was a very large BBC survey and it got a lot of interest. But this is the kind of difficult issue. If you were in the lower classes if you were kind of what we call the precariat which is people on insecure jobs not earning much you just did not do it. You would not go to the BBC website to do this quiz because you didn't think it was for someone like you. And then on the other hand if you were earning lots and lots of money and been to an elite university you were really very likely to do it. So you had a hugely skewed sample and basically the more money you earned and the higher educated you were the more likely you were to do it. So we knew it was out of the beginning so we deliberately didn't use that as a representative survey but it does pose that challenge. I think the way to address that challenge and to go back to what we were saying earlier on is to mash up different data sources. And also in our case what we did is we did some ethnographic work with the people who didn't do the survey. It's going to look at why they weren't doing the survey and what the issues were there. So I think you can get around it but you absolutely need to be very careful in looking at the absences and the biases in the survey and the data. Thanks. Something you want to add. Okay that's not the case then I will open up for questions now that we have enough time to do that. Please be kind enough to use the microphone and maybe stand up that people can see you and just give your name. Hi. Thanks very much. I'm always a bit nervous so it might be that the mic shakes. My name is Sevda Aslan thank you for your talks. I'm from Mannheim University and I wanted to stress a point that Mike has just mentioned before about the ethics. I was wondering you know if I do research I ask people do you want to participate in my research and usually I only take them if they say yes. So normally they get money for it or chocolate or something else. And I was wondering about the data that you suggested that we should use the big data from administrative places that is basically state surveillance or from Google which is a company that exploits their users to gain profit. I was wondering where do we really want to use this data just because it's there. And if yes where do we stop like if the NSA comes across and says hey we have so much data don't you want to find out more about society. Are we going to say yes so interesting. We should and they probably have nice visual visualizations too. So I was wondering where do we stop and do we really want to have this kind of data or is there something like a codex of I don't know for data something. Some some ideas on that is there an ethical discussion going on. Absolutely there is absolutely right. Ethical issues are huge and there's no easy solution to them. I think the issue of informed consent. I mean it said most of the conventional social science interviews and surveys use informed consent as the kind of gold standard and people not only have to give consent but you need to explain it very carefully and then you need to understand what's involved in informed consent. That does not happen in most forms of digital data. People don't realize when they're using the Google searches that the stuff is going to be analyzed. But of course if you want these days you need to use the Internet to access to do routine things in a way just saying well I don't want to tick the box. If you do that you are denying yourself lots of possible ways of accessing services. So in a way you know we have we have no choice really but to do these sort of things and but I would to I think they are really serious issues around that. But I also do think that you know there's no such thing as any kind of a search which doesn't pose ethical issues. And from where I'm sitting I'm really interested in kind of how we get at the perpetuation of privilege and power and that people in power often don't want to be interviewed and they wouldn't do these informed consent interviews at the time. And this is a way as Piketty's contribution he's put the spotlight on the very wealthy and the very rich by looking at data sources like income tax data. And I think you know so I think we do at times I think I have to think about if you want to ask challenging questions which are addressing really important social scientific issues. We do need to think about extending our kind of ethical range away from something you're saying and people don't want to talk to us and we just have to leave it there. Otherwise it's going to limit I think what we can really do. Yeah. The extremely interesting question we have to see whether we put that in our efforts with this series here because it's not only the ethical things you mentioned are not only about data protection and the classical data protection but the ethical way data is gathered and whether it's okay to make use of that. And when it comes to data protection if I may talk as a lawyer then you have the special problem that you can anonymize data in a state of the art way. But you can't be sure that they are analytic not analytic models that repersonalize these data sets again. And so the traditional instruments of data protection are really challenged as regards that there is another question. Okay. Sorry for that. I'm Stefan Rasmol from the University of Lugano in Switzerland. I think all of the researchers and all of the journalists in the room know that you can play with data and you can manipulate and I think all of us also know that large and powerful companies beginning with Volkswagen in Germany or with Exxon in the United States concerning data about climate change or talking about the cigarette industry. We all know that they have used data very selectively and manipulated in their own interest. Now my question is if Google and Facebook and other players are becoming similarly or even more powerful with all the data they have. Isn't there a danger that they will also manipulate all of us by presenting very selectively data which you have. And well you will present those data to us which help you for your business interests and you will possibly withhold those data which don't serve your business interests. And now you are even co-opting some of the most powerful media companies who should be your watchdogs and they become part of the digital news initiative and cooperate with you. Isn't that something well which shouldn't only frighten young researchers from London but maybe all of us here and elsewhere. Will you comment on that? I mean I can only comment on that from someone who just joined Google of course five weeks ago and I stress that again. So I can only speak from a news lab perspective. The data that the Google News Lab team provides and gives access to is of course open to everyone. And we not select because of a business interest but because of a editorial interest. So if you go to the GitHub page of the Trends team for example you see a various collection of data sets that range from football to politics to science to lifestyle issues. And there is certainly no limitation to that or no filter to that. So from a news lab perspective I think it's important to stress again that we provide the data but it's the journalist or the academic researcher who puts that data into context. Thanks. My name is Lena Ulbricht. I'm from the Social Science Research Center and I have two questions for each of you. One is for Isabel. Let's say well there is some mistrust in the data you provide but once we agree that you provide the full data you have I'm still interested in how academics and journalists can use it. The traditional data set was provided somehow in a transparent way. I had a code book where I could read the operationalization of variables. I could read the whole process of how the data was collected and so on. Now I'm interested in the data you provide for example about Google search. Do you also provide research about how the data is generated? Do you do research on under what conditions people use Google? When do they use Google? What does it mean when they ask Google questions? For example you gave the Donald Trump example. When I search Donald Trump does it mean that I'm in support of Donald Trump? Does it just mean that I'm somehow fascinated? Does it mean that I have no clue about who he is? Does it mean I want to see pictures and mark his hair? So how much do you know about how the data is generated? Because that means a lot to how we can interpret it afterwards. And if you do research about the context in which data is generated do you provide it to the public? And my question to Mike comes a little bit back to the question you asked when you published your paper in 2007. You were interested in what does big data mean for the development of social science? And now you've discovered that social science uses big data. Some people say that the big data hype is also an instrument of researchers just to gain funding. They claim they do big data because they know it's just the way you get funding nowadays. So I'd really like to know what do you think on whose expense is the expansion of big data? The traditional statistics in sociology? The classical surveys or more the qualitative research? Thanks very much. Challenging question so we'll start. I'm going to start. Good question and really interesting question. And I'll go back to the time where the team and that's again prior to my joining the team. Where there was no dedicated site to no dedicated Google Trend site where you can actually see trending stories or explore trends or explore topics based on search. And there was also no team who was working with journalists and academics. So the two ways we work now with journalists and academics is one we provide raw data and give of course context what the raw data means. And second we collaborate with journalists and academics. And you saw all these examples for example the measurable example. Measurable approach the team and ask them if Google would have data visualization capacities to help them to drive that story around the Nepal earthquake. So but it's a good point and I think I of course have to double check again with the team whether we plan to give more context if we want to collaborate with journalists and academics. But I think the goal and Simon Rodgers is leading that team. The goal is of course that we provide the context and give more information around what it does mean if you search for Donald Trump if you like his hair or if you're just interested in if he's attending the GOP debate. So definitely I'm taking up your point. Almost likely that you are interested in his last rant at some show something like that. Please. Yes I mean it's a very interesting question. I think we're in a very very fascinating time for methods and social science. And there's different tendencies at work different. Some of them link to big data some of them not really that directly linked to big data. So in terms of the challenge to existing methods. I mean my my view would be what's it may be different in Germany but in in the UK I think the sample survey which is kind of very widely used and also as is used by market research companies. There's real challenges to that. We saw it in the recent election in Britain when they all got the results completely wrong. And this issue is partly about you know if you're using if you're doing phone interviews people don't use the phone anymore. If you use mobile phones that's also skewed. People aren't at home. The sponsors are falling. I think what it doesn't mean that doesn't mean the sample survey will decline entirely. I think what is happening is and it's happening in the UK is that there's funding for the really big high prestige surveys the panel surveys particularly for people ones where people in the sense are invested into it and they will carry on doing it year by year because they feel they have to or feel they should do. But it's more difficult to get funding for a bespoke you know one off survey these days. So I think it's the big major surveys will carry on. But you know the more one off surveys will are going to fade away. I do think the in depth interview which has been a major method in Britain is looking really quite problematic because we are so saturated by interviews and my people are so used to interviews and it is actually a really interesting piece by the American Central to shame us calm about what you get in interviews is accounts and people are very skilled at getting accounts. And obviously it depends on your background and your cultural capital but nonetheless you get accounts but how that mapped on the practice is problematic. So I think those I think that those two are in difficulty. On the other hand I think ethnography the old old style in intensive immersion in a naval organization is actually very strong possibly getting stronger because it's actually extremely good compliment to looking at big data because you can actually see how it is being deployed and interpreted in particular contexts. So my sense is some quoted methods. I think I think one's probably getting a bit of a resurgence but the in depth interview by itself is declining. And I think that the I do think the use of administrative data taxation data is going to be very interesting to watch. We had this big debate in the UK about whether we need to carry on with a census which we've had every 10 years for the last 200 years. And there was one argument saying we don't need to do it anymore because we can just link together all the all the government departments records and that's much easier and much cheaper. In fact they are keeping going to 2021 but that's that debate is going to be had again. Thanks very much. I've seen three hands up and we have time for one question I guess and the final remarks. I'm very sorry. I saw that you have raised your hand here but you have the opportunity after that to talk with the panelists. So please this question and then a brief final concluding remark from your side and then we have to finish. Thank you. My name is Ricky. I'm from the University of Zurich also Switzerland and I thank you both for your very interesting presentations and I totally get the point of telling stories and about the power of visualizations. However I have to take up a point that our American colleague Rod Benson stressed at the ICA in Seattle two years ago when what he called new descriptivism. If we all go in this direction and if we all do that you know take those sets and do nice visualizations then I wonder what happens to explanation and maybe continuing on that what happens to critical research. So maybe you could say something like that. Thank you. Thanks for the question and actually I was in Zurich yesterday. I really enjoyed it. Great question and I hope I made the point that by showing beautiful visualizations not only the Nepal example or the climate change example or any other example. The data and the visualization is not enough because we might be all experts in a way that we are able to understand bits and pieces of those visualizations but definitely not the entire picture. And if you then imagine that a much bigger audience sees those visualizations embedded in a news article and of course they have no idea what they see there. So I think and you make the point. I think it's extremely important to still have that context and to analyze those data sets and those visualizations in that context. And I can't believe that any journalist or data focused journalist who's using data to innovate or to drive innovation and storytelling is only focusing on the data set. They will still do their investigative job and they will still do their research job which I think is absolutely crucial. And yeah and I think you made the point yourself and I haven't seen data stories that were lacking any description and any analysis. I'm very interested in the debate about description explanation and I think there's also ways of addressing it by thinking about what different models of causality and different descriptive strategies. But on the point you specifically made about being critical. I think when we can discuss what we might mean by being critical but actually I think there is one very important way which being descriptive allows you to be critical. And I think comes out very well in the work as a picket and the pickety is if what you're doing descriptively is to array a series of comparisons say the different nations of the world and you rate their health and your income. You can use that to kind of pull out the contingency of particular cases and say why is it the case that the US which is the wealthiest country in the world has the poorest health rates and things like that. So actually description done well allows you to be very critical I think and that's because he's the same. I mean by arraying trends of income inequality over time you can actually say well how is it in 2015 we are returning to the aristocratic order of Europe before First World War. It's a very descriptive point and I think a very critical point and I actually find this is perhaps an argument we can have later or discussion we can have later. When people say we must not lose sight of explanation that sounds fine but there aren't actually that many it seems to me really good explanations of things in social science. You know we aren't either very reductive you know it's all to do with capitalism for instance or fairly tautologically obvious and perhaps you got some but I think the thing about explanation is often weird out it sounds good but in practice I'm not sure it is such a powerful valiant coin. Thanks very much. Only very very few concluding and technical remarks but very important ones at the end. The first one a big thank you to our speakers were really fascinating and I think it's worth clapping our hands for these fascinating remarks. Thanks very much indeed. Thank you to the British Embassy of course has been an excellent host here and could be in this marvelous rooms to the Vodafone Institute which makes this possible and is an excellent partner in doing this. Thanks very much. And our organizational team of course Lena Ulbricht Christian Petzold and Larissa Wunderlich. Thanks very much to run extremely smoothly. I'm also thankful to the two speakers of the division sociology of media and communications and computer mediated communication of the German Association for Communication. Because they were so kind to accept that they do not have room here for welcome address and leave us more room for discussion so that's very very kind thanks very much. This conference is not only the starting point of a lecture series by the Vodafone Institute and our institute the Humboldt Institute for Internet and Society but also for conference on media and complexity that takes place tomorrow and the day after. And I hope this conference will be extremely fruitful and welcome to all guests that are part of these community. I think that's it basically. I've already talked about the series we have that means that we hope to welcome you to another event. If you have subscribed to our newsletter we will inform you when that takes place. It's in January and March but the exact date is to be announced. And I think that's the final remark. The last one is the one I like most. I invite you to drinks I don't have to pay for. So please please enjoy the hospitality of the embassy and the talks with the panelists and our team and with each other. Thanks so much.