 Fel ydych yn fwyaf, mae ymlaen i'r maen i�raed gyda'r llyfridd yma. It's my pleasure to welcome you to what is our first data debate series. So it's an interesting one for us. It's inspired by the new partnership that we have with the Alan Turing Institute, which is the new UK National Centre for Data Science, ac mae'r rhan o'r rhan o'r Byd yna ennyd, dyma'r Llyfrgell Llyfrgell, mae'n dynas yn y cent oed yn y sefydliad ar y llwyfer, ac yma'n erbyn â'r anhygoel ar yr anhygoel, ac mae'n ei gwrs hyn yn ymgyrch yn gwneud ymlaen, dyma'r anhygoel ar y newydd ar gyfer y dynna, ac mae'n ddiddordeb yma'r ddweud yn ddiddordeb yn ei ddiddordeb yn y ddiddor o ddiddor o'r anhygoel ar gyfer data. Dyna y cwest Sri Anon. Stone is a link to this video so welcome. There is a couple of other important partners for us that are here tonight, that I'll mention in my introduction. Arts and Humanities Research Council and Economic and Social Sciences Research Council who are really important long-term supporters of research in these areas and that really inspire us to have this sort of debate. technologically about algorithms, but also to reflect on human behaviour on society. On big society, as well as maybe as big data. So that's intention and thank you for coming for our first attempt to do this. So I'm just going to tell you few words how tonight is going to work. So we have our excellent panel here and they are going all to chat and tell us a little bit ac pa ydw i'r ffordd o'n meddwl. Ac nid yw'n cael ffodol, oherwydd mae gennym ni a ffortisbent yna sy'n maith yn fyddiol oherwydd y panel, ac mae'n rhaid eich gwneud i'r ffordd o'r ffordd o'r wath. Yn ymgyrchu, mae'n rhaid i'r ffordd o'r sefyll, ond, yna'r ffordd o'r ffordd o'r rhan o'r ddefnyddio yng nghymo'n bod yn ymgyrch â'r YouTube. Ond yna bod ni'n meddwl i ni'n meddyliau yn y torffaidd gyda eu gwaith, ond mewn meddwl yn meddwl i'n meddwl sy'n meddwl— bydd ydych chi'r cyfrifio a bydd yn meddwl i gwaith, a mynd i chi wneud findingo dreifio ar y dent. Ryd yn y cymdeithas y torffaidd gaf i fyny, ac rydyn ni'n gweld gwch eich panel yn gŵr fanaill yn gyfrifio. Rydyn ni'n meddwl i ni'n meddwl gynghwylio'r cyfrifio, ac yn ddaeth i'r ddechrau, ac mae'n ffordd yma'r ddechrau. Mae'n dweud y rhan. A hynny'n dweud y gallwn ddechrau, ychydig yn y Dynion, yng Nghymru, yn chefsylfaeth Bailys, yr ysgol yma'r ysgolion web-ynghyrchu yn y ddechrau. A hynny'n dweud yma, ac yn dweud yma'r ddechrau, ond mae'n dweud hynny'n dweud yma'r ddechrau. I це today afternoon So, so he's well prepared, but we know that he has extensive knowledge in this area so I'm sure we'll have a great input. And also, thank you very much for Dimandra, David and Helen. So I'm going to hand over to Dimandra in a minute, Dimandra Hawknis who is our chair. I'm just before that going to say to you that we're very lucky to have Tymandra. She has so many hats. It's impossible to choose. Radio 4 presenter, newspaper columnist, science speaker, comedian. And also for all of you, you can have lots more of Tymandra because there's excellent and fantastic book that's called Big Data, the size matter. So I recommend that to you and I'll hand over to Tymandra. Thank you, Maya. Yes, it's my job to steer the good ship of the debate through the next hour and a half or so. What we're going to do is each of the speakers have been asked to keep their introduction short, not more than 10 minutes. And then what we'll do is we'll probably have a little discussion just between the four of us to draw out any ideas that they've raised. But then there will be lots of time for you all to join in. So I do hope that as they make their short introductions and as we have a little initial discussion, you'll be making a mental or even a physical note of anything that you want to throw in. Don't feel you have to limit yourself to questions. Feel free to ask questions because we've got three extremely knowledgeable people. But we're also interested in hearing your ideas and your thoughts. So there will be plenty of time for that. I think we probably have a roving microphone, is that correct? Yes, yes, good. But that's a little way off yet. So don't worry about that yet because we're filming obviously. If you're not meant to be here, if you said you were going to be somewhere else and you want to ask a question, like put on a big hat or disguise your voice or something, get some privacy tips from David. So we have three speakers and they're going to speak for about ten minutes each. If they start overrunning, I'm going to subtly wave a piece of white paper under their noses. So hopefully they will know then that it's time to start speeding up and say all the same stuff but twice as fast. And I'm going to introduce them all and then let them loose on you. So first of all on my immediate left we have Helen Margats, Professor of Society and the Internet at Oxford University. She's a Professor of Political Science and she's the author of a book, Political Turbulence, How Social Media Shape Collective Action. So you can see exactly what I wanted to invite her. She's the director of the Oxford Internet Institute and a faculty fellow at the Alan Turing Institute based right here. Then we have David Vincent, also an author of two books, Privacy, a Short History. And I hope I don't intrude privacy and its dilemmas in the 19th century. He's at the Open University, he's a historian of privacy and secrecy, but he's also visiting fellow at an institute at Cambridge called Crash with Two Ss, which I suspect is less dramatic than it sounds and more intellectually dense but still really good title. And then on my right, as Maya said, stepping in heroically at a very short notice. Replacing Facebook. Replacing Facebook. Replacing the whole of Facebook. When you get home tonight you'll find Facebook is not there. It's just basically a picture of Jackson's face. And if you remember how to spell his name then you get access to the whole Internet archive, which apparently is in a church in San Francisco. So no, seriously. So Jefferson Bailey is the director of archiving at the Internet archive in San Francisco, which is apparently based in an old church. So basically the whole of the Internet is preserved in an old church in San Francisco. Not all of this but a lot of it. What? Okay. But the Internet is archived in an old church in San Francisco and Jefferson has worked in digital preservation not only there but also public libraries, museums, the Library of Congress, the US National Archive. So his work is all about web archiving and then giving access to researchers. So in many ways I think we have a much better guest than Facebook because we all know they inevitably would have only given us what they were allowed to say. And then instead of asking questions you'd probably have just put your thumb up. Say whether you liked it or not. So Helen, take ten minutes. Tell us about social media and politics. I will do. Thank you. Yes, I'm going to try and answer this question with respect to politics and political behaviour. But I hope at the end I might kind of widen it out a bit and that you might think it applies to other sorts of behaviour as well, what I'm going to say. So the first thing I want to say is that politics increasingly takes place on social media. If you think of any of the things that we normally construe to be politics, collective action, political participation, maybe engagement with some governmental organisation, political discussion, deliberation, elections, political campaigning, it all takes place or at least plays out on social media. That is that even if you were to go and watch a televised debate during an election campaign, most of the discussion about that and the reaction to that will be visible on social media in some way. The point there is therefore that we're not really asking what's the use because it's not some sort of nice extra information about what's going on. It's the main source of what's going on, that's what I want to suggest. Just as that applies to all the activities of politics, it also applies to kind of the dark side of politics, it applies to terrorism, radicalisation, the far right, hate speech, racism, riots, all those things are coordinated and communicated to a very large and increasing extent on social media. So all of those things are visible somewhere in social media data and therefore can't ignore social media data. So that's the first point, of course we need social media data. In some ways you might argue it's the most important sort of data if we want to understand a political system. There's another key point here about social media and political change. As the manager said, I and some of my colleagues at the Oxford Internet Institute and UCL as well wrote a book called Political Turbulence looking at the relationship between collective action and social media. In that book we observed and argued that a key driver of social media related change is that it makes possible very small acts of political participation, sort of tiny acts of political participation, that in an earlier era weren't really possible liking something, following something, signing a petition, following somebody, viewing something. There's evidence now that suggests that people share news items more than they read them, they share them without reading them in fact. And you might construe that as a political act, you're drawing somebody's attention to something. Sharing a photo, think of the influence of photographs like the one of the dry on Syrian boy or the German football stadium with a sign saying refugees welcome during the refugee crisis of last summer, this summer, sadly every summer. These are very small acts of political participation and they're kind of new because in an earlier era the transaction costs would have been too great. Politics has become less lumpy than it used to. You don't have to necessarily join a political party or you don't have to, as Oscar Wilde said, the problem with socialism is it does cut so dreadfully into the evenings. You don't have to have your evenings cut into, you can do a little bit of politics and that is drawing a lot of people, including people that we used to think don't participate in politics into politics, particularly young people. So you can have the situation that I had a few months ago when my 14-year-old son says at breakfast, I'm friends with Jeremy Corbyn on Snapchat. And the point about those tiny acts of politics is of course they are reflected in social media data, but they are only reflected in social media data, they aren't anywhere else. So that's another reason why you need them. Now in that book, which I won't go into the details of that book, but we did argue that the way that those tiny acts scale up into important mobilisations and sort of waves of support and possibly demonstration and protest is actually very unpredictable and unstable and it's making politics more unpredictable. It's making unpredictable things happen and that's why unpredictable things keep happening. So in another sense that data is a kind of clue to that unpredictability as well as getting unpredictability, we ought to get some capacity to predict what's going on using that kind of data. And as a social scientist, and I'm sure there's social scientists here in the audience, it's actually incredibly exciting to have this sort of data because it's real time transactional data about what people are really doing in politics. It's not a survey about what people think they did or think they might do and something that people are quite often unlikely to remember. It's real data about what real people really did. So in that sense it's exciting and it's a way to cope with the unpredictability of modern politics. But to move on to the problems with that because obviously that's painting a very sort of utopian vision of how we understand politics in the age of social media, actually social media data is very difficult to get hold of. And that may seem counterintuitive because we all attend kind of what we hear or we read articles about how we're awash with data, the data deluge, data swimming around, you know it's like we're swimming around in data. Actually it's easy to get hold of Twitter data. Twitter has an open application programming interface, an API which means that you can actually get hold of data in an open way and that's great and that means that there are hundreds and hundreds of articles about how people behave on Twitter. The trouble is that most people don't use Twitter. Actually more people use Instagram than they use Twitter. The Instagram belongs to Facebook and it's very difficult to get hold of Facebook data. Facebook doesn't have an open API and most Facebook data is not available for the purposes of understanding what's going on in politics. And there's lots of ways in which kind of most social media data we don't actually have. The average young person is active on five social media accounts and any of you with children will know you probably heard of two of them and you haven't heard of the other three, something like that. So to get a handle on any kind of politics they're doing on those platforms is actually very difficult indeed. Snapchat, which would have reflected my son's enthusiasm for Jeremy Corbyn is deleted or is supposed to be deleted. We're just talking about that. It's supposed to be deleted as soon as it's been sent. WhatsApp is encrypted. I could go on and on about all the different ways. So some data is available. You can get Twitter data, you can get Wikipedia data is freely available. You can see what people have been doing on Wikipedia. But a lot of it isn't and if you want to buy it it's actually extremely expensive out of reach of most research organisations. So that means that kind of this new both necessity and capability to understand politics is largely in the hands of internet corporations. They're using that data all the time to make their services better. I say better in scare quotes because of course better but also more profitable and so that they form an ever greater part of our lives. So they are using that data but they're not publishing it very much or actually analysing the effect it's having on political behaviour and political systems or at least they're not sharing any analysis of that kind they do. On the occasions when they do they're inclined to get completely lambasted as many of you will have heard of the experiment where Facebook were manipulating our emotions by feeding us negative stories and seeing how that affected our mood. Now of course they do that. Any company that operates an internet based platform does that kind of experiment all the time to see the effect of different modes of design. But the fact is they're doing that behind kind of closed doors and we don't know so much about what's going on. Of course that's important not just because we're missing a trick as far as understanding goes but because also of course when they do make changes to their platform and observe the effect they're also shaping our political behaviour. Now Facebook wasn't invented for politics. There's lots of politics going on there but most of the time people are doing other things. Facebook is a platform which was established initially with the idea of helping shy young men find girls basically. I mean that was the kind of sort of idea of it. But I've nearly finished. But a lot of politics is going on there and when changes happen so for example when a social media platform introduces trending information saying which are the most popular tweets for example or the most popular anything we know from decades of social science research that that affects the way you behave. That affects how much you like something. We tend to like more popular things and not like so much less popular things. So any kind of platform change is actually likely to change political behaviour and that's quite important. Facebook for example a couple of years ago had a very large experiment with 61 million kind of subjects as it were where they tested the effect of telling people how many of their friends had voted, turned out to vote on that person's likelihood of voting. It didn't have a very good big effect but it did have an effect. It's supposed to have encouraged 66,000 people to go and vote. Okay, you can say that's great. That's Facebook doing good not evil and of course that is good but it does put a lot of kind of potential in the hands of that corporation because of course we know that turnout is differential according to which party supporters of some parties are more likely to turn out than others. So there's an possibility there. I'm not saying in any sense that they're doing it but there's a possibility there for them to have quite a profound effect on an electoral outcome. So finally, I'll end there. Of course we must use social media data to understand politics. We have to, we won't understand it without it but there are some ramifications for that in terms of who owns that data and what they can do with it and other people can't. Thank you Helen. I will come back. You can shoe on more points in later. So David, I think you're going to take us back a little in time for a longer perspective. Yeah, I'm going to start with Charles Dickens, whose name is I see over a room in this building. In the 1830s he was setting out as a writer at the same time as the modern Cystical Movement took off fueled by the Great Reform Act. Amateur enthusiasts and government officials discovered the power of counting. It was the beginning of big data on a let. It was seen to be possible to express the complexity of social conditions and behaviour in summary tables of figures and thereby measure progress and the need for reform. Dickens responded to the new Cystical Movement with a terrific statistical essay purporting to be a full report of the first meeting of the Mudford Association for the advancement of everything. He mocked the use of modern communications, paraded the counting for the sake of counting and attacked the amoral measurement of human suffering. A central figure in his meeting was Mr Slug, so celebrated for his statistical researches, who in the course of the meeting found himself in personal jeopardy. Intelligence has just been brought to me that an elderly female in a state of inner variety has declared in the open street her intention to do for Mr Slug. Some statistical returns compiled by that gentleman relative to the consumption of raw spirits as liquors in this place are supposed to be the cause of the wretches animosity. Mr Suttire, this was funny, entertaining and completely wrong. The entire point of the paper-based statistical movement as it developed in Britain through the 19th and well into the 20th century was that it did everything possible to distance information from identifiable persons whether inebriated or not. In every context, statisticians sought to create the OYN, the average man, devoid of identity and difference. The summit of their achievement, which was the decennial census, recorded names, but only to check that the enumerators weren't inventing their returns. The published tables were fully anonymous and the overworked general registry office resisted every suggestion to link official databases. For this reason, the censuses actually provoked very little hostility. The modern concern about privacy began with the growing use of mainframe computers in the 1960s and the immediate recognition that they would permit data linkage on a scale which had been impossible before. Privacy had only been defined as a human right in the UN and European Decorations of Human Rights in 1948-1950. But within less than two decades it had acquired a zombie status for ever being killed by digital communications only to rise up with a knife in its head to be killed over again. Can I have my other slide, please? Yep. That's my one slide tonight. The threats to privacy are real and they're growing in their scale and complexity, but if we're to understand the direction of change, we need to find answers to two questions which have been on the table since the inaugural meeting with the Mudfodg Society. What information is being collected and how is it being communicated? The answer to the first question is that we are a good way down the road to identifying the inebriate elderly female. If she's got a loyalty card then the local test-gunner is what she's doing as a course with the local shopkeeper in her time. If she's been treated as a doctor for her alcoholism, there's a danger for her address and her medical profile being linked. If she's discussing her outrage with the Mudfodg Society or its successor with children or friends on Facebook, well, we might have had another debate about that tonight. The association of the breach of privacy with the exposure of shameful behaviour has now become a convention in the popular media. Yet the scale and reach of digital surveillance still has difficulty penetrating far beyond overt social or anti-social behaviour. For all the influence of modern consumer culture, identity cannot be fully summarised in terms of online shopping preferences. The function of privacy is to patrol the space within which the individual can know themselves and manage their most intimate relationships. Even an elderly consumer of spiritual liquors, spiritual liquors, has the capacity to choose what is known about their innermost feelings in communications. Our own times have suddenly just been discussing it's not the old but the young whose self-exposure through web-based channels appears to exemplify the transparency of the digital society. But close examination of their messaging reveals conscious and sophisticated constructions of preferred versions of themselves, some experimental, some partial, others deliberately misleading. Second question is how rather than what is also a matter of carefully calibrating change. Privacy is not just a matter of being let alone. Cry the adolescents down the ages. Rather it's a condition within which the most valued personal relations can flourish. Such exchanges in Dickensworld are still to a large extent in our own embrace every available mode of communication. Individuals know that they know each other because they require so little effort to the effective exchange of emotion or information. Back in the late 16th century, the first modern essayist, Montaigne, marbled at the fecundity of face-to-face body-to-body language. After all, he wrote, Lovers Coral, make it up again, beg favours, give thanks, arrange secret meetings and say everything with their eyes. What about our hands? With them we request, promise, summon, dismiss, long list there. And what else? With a variety and multiplicity ridering the tongue. Now Dickens was a product of and did much to shape a new popular literary culture. We get the penny post and the rapid enlargements of the realm of virtual privacy. Nonetheless, face-to-face communication remains central to the conduct of intimate relations at all levels of society. Over time it's been supplemented by the post telephone and now the digital media. These innovations have communicated but they haven't invented the essential risk register of the technology of privacy. You go back 15th century and you find correspondence having to balance the gain in maintaining intimacy over distance with the danger of a letter getting lost or falling into the wrong hands. It can be argued that the driving desire to maintain networks, friends and lovers in the digital era has overwhelmed any individual's capacity to control the danger of surveillance. Facebook's multimedia intimacy is more complex and more vulnerable than the letter or the telephone conversation and particularly after the last couple of weeks Mark Zuckerberg is never going to attain the same level of trust that's long been invested in the family of postmen. Yet within social media and across the range of messaging that still occurs on a face-to-face basis strategies of concealment ways of embodying meaning in verbal and physical devices that are fully meaningful to the auditors remain powerful protectors of privacy which at least in part explains why the use of the social media has been little affected by the continual exposure of its dangers. We still got difficulty teaching algorithms to comprehend irony. Progress is being made with devices that can read facial expression although lessons often claimed. Down the road machines may be able to see a separate tone of voice gesture of hands or the meaning of touch never say never just be cautious where predictions are linked to anticipated share price. Now at the end of history lies the world lately described by you will know our Noah Harari in Homo Deus where personal privacy is not so much destroyed as rendered redundant. Where the nature of religion has replaced religions on missing God identity only has meaning in connectivity. But if you read his book closely you find that like most prophets of doom Harari has given himself a get out clause. The dataist revolution he writes in the final chapter will probably take a few decades if not a century or two and in the speeded up history of the digital world that's sometimes never it's not over yet. Thank you so much. Thank you David, tempered words of doom. So Jefferson Okay. With your, you had hours possibly hours to prepare this. Give us your best off the cuff remarks. Sure. Well first I would like to thank Facebook for cancelling giving me this opportunity. So I am director of Web Archiving at Internet Archive for those that aren't familiar with Internet Archive it's a non-profit digital library in San Francisco it's at archive.org it is the largest cultural heritage Web Archive we don't collect the entire internet which would be quite difficult but the Web Archive is started in 1996 when the organization was founded and it is about 15 petabytes currently and we collect about a billion URLs per week is the scale. Internet Archive also has lots of other activities as far as digital library ebooks digitizing movies and all sorts of digital cultural heritage so the mission is universal access to all knowledge and we do try to live up to that in scale both in acquisition preservation and especially in open access so this would I figured this would be a fun talk to talk both about the web and social media is both of the web but also I think as Helen suggested we are at an interesting moment that social media is fracturing a bit in the publication platforms and its accessibility via internet protocols and things like that but the place of the web in the archive and I will talk sort of from the archival long term perspective that doesn't necessarily prognosticate around intended use into the future but more focuses on making that content accessible for any use in the future so the web is 27 years old or so depending on when you argue that Tim Berners-Lee launched his idea for hyperlinks and the web itself obviously built on technologies that are a bit longer than that the web in the archive is interesting in that its emergence was technological it was obviously scholarship oriented it emerged from in the US military activities the defense department funded a lot of the early research so it was about sharing knowledge but not necessarily about the personal creation of personal records or artifacts or broader social communication so its place is a historical record many institutions like IA and like BL as well collected the web pretty early in its emergence but many libraries and archives and preservation institutions took longer to do that but what has certainly happened as its exponential growth has gone on for a number of decades now is that it has consumed all media forms of record or artifact creation correspondence, personal papers obviously photography, moving image they are all mostly web published so it is a publication platform that is sort of taken over all what used to be very siloed of formats so the web, the archived web is not just a social record its now a record of technology and change and communication methods so social media, I think when we talk about social media data we do talk about Facebook and Twitter and all the current ones but of course social media platforms are themselves historical artifacts so for those who don't remember geocities certainly one of the biggest ones and most popular that was acquired by Yahoo in the early 2000s and then abandoned by Yahoo in 2009 so it was shut down and many preservation efforts did try to archive as much as geocities at that time as possible so there are good snapshots of it before it was shut down but certainly not in its entirety and so internet archive does not collect all of the web it is a big data set but there is too much web to get others, Friendster was a very popular social network in the late 90s early 2000s and it was also abandoned and is now I think a video game site so other social media in the archive of course a Library of Congress had an agreement with Twitter in the late 2000s to get a feed of the Twitter fire hose for archival purposes and that is an exciting initiative that has not figured out access so I will talk about access and research methodologies a little bit in talking about social media data it is both interesting to look at the historical uses the research challenges and the preservation and capture challenges and especially just to note with geocities in Friendster it is very ephemeral like all the web so the average lifetime of a website is in 96 days if you do the studies and you will see commercial entities geocities Friendster but certainly one would imagine don't listen to and don't stop paying attention to Facebook but of course Facebook will be gone someday YouTube will be gone someday so when you are thinking in the sort of archival scale of preservation and access the ephemerality of the commercial entities that social media is tied to is actually quite short so what are current preservations around social media I come from the cultural heritage library archives world and the scale and the challenge of collecting social media and web content is quite large so it is collaborative it's multi institutional it's shared technologies and shared expertise going on from a cultural non-profit preservation aspect the challenges that are social media specific are what we've discussed in their commercial gated sort of walled garden fashion they are often very challenging to traditional methods of web capture so crawling and crawlers don't often operate well and very dynamic or javascript driven web portals of which social media of course is a very fancy one Snapchat and others emerging especially mobile mobile only devices and social media channels there are no great scalable ways to preserve those what will be important historical records so preservation efforts are underway but the pace of change of the web and technology and commercial industry of course is much faster than the funding and mandate of libraries and archives but we do IA and other preservation institutions capture a lot of social media and it's interesting in facilitating research use of those beyond just the sort of way back machine replay method which are data driven so it is very easy to get as Helen mentioned Twitter data via the APIs and it is easy for us to extract and make accessible other social media platforms that don't have APIs but that we can make archival copies of via crawling or even donated data methods but what we found is not just the infrastructure challenges for historical use but especially the methodological challenges to using social media data certainly humanities and much of social and political science analysis is based on monographs and text and sort of item level analysis and not really the sort of big data mining activities that are more common in the hard sciences and computer sciences so data mining is emerging as far as using social media data but still hasn't necessarily caught up with the sort of analytical approaches that a lot of historians bring to it so the other challenge of course is how are we capturing social media data and that is very hard to document because it is very technology dependent so methods of capture and how we decide how much to crawl of what social media platform what data gets acquired by the crawler and what doesn't how it discovers new content algorithmically and then configurations are very hard to document and so the record that we're giving to people is large and quite scalable and suitable for global big analysis in social and cultural research but also the method of acquisition the method of preservation is very challenging to document in a way that is easy for a researcher to use and understand but it does capture all sorts of content and so it's interesting I think and social media research and analysis has been very platform oriented you study Twitter, you study Instagram but of course these methods of publication and communication social media are parts of other archival and historical records that people are creating so emails and personal paper collections and software and VMs and spreadsheets and other forms of data that people create either in their daily lives or as part of the research process those both are involved with social media and associated with them but also separate so I think social media and its use has an interesting possibility for a sort of confluence across record types and so that should be interesting to see as far as the research methodologies evolve and then the sort of the final point is the larger archival imperative around social media capture and preservation and these are points that are also applicable to the web itself it is obviously the sort of defining publication platform of our generation it's one that changes and disappears quite quickly far beyond the ability of preservation institutions to capture big parts of it but some of the other, I think amazing aspects of our moment in time as far as preservation and use of social media and the web are the sort of plurality of representation that come on these platforms so the archive and the historical view going back centuries has often been of the rich, of the famous, of the powerful representation of societies and cultures over time has often been of a very small strata of society and of a very limited view of daily activities and people's lives and ideas and cultures so social media and the web have given us a very amazing but probably very brief moment in time to sort of drastically expand the archival record and what we can present to the future is a collection that represents our time so I think when we get to what's the use for the librarian and archivist we tend to not know what use is we have just as a crazy example someone uploaded to internet archive all so Kmart is this very big shopping center like it's Walmart before Walmart and they used to just have music bland music that played throughout the stores that they actually created themselves so it was not piped in or satellite fed and someone store manager just recorded hundreds of hours of it and uploaded it to internet archive and it was the most popular thing in the internet archive for like a month people listening to music from Kmart so that that would have such like cultural meaning to people while of course we all have all these other amazing collections of personal papers and people just wanted to hear the music and look at the geosities webpages so it's very hard to predict use both for scholarship purposes but also for the meaning of individual citizens so in social media and it's format agnostic mode it's scalability, it's potential to be used levels of society creates a great corpus that will be very interesting for future use and is very meaningful to preserve Thank you very much Well lots of very different things from very different angles being thrown up there and I hope you have lots of thoughts going on but I am going to exercise my prerogative to jump in first with some thoughts and questions of my own across what the three of you have said there's a picture here of being able to use social media as quite a unique resource for contemporary social science and other study but also presumably for the historian and I thought it was quite interesting David that you seem to talk about it rather in terms of people's experience of having their social media collected and of being studied rather than with your historians hat on as something that you might want to dive into Is it because you have a kind of disquiet about whether studying people's social media is a kind of modern version of the statistical movement that everybody is reduced to their statistics because something, I mean correct me if I'm wrong you too but something I'm getting from you is that uniquely really being able to capture people's tweets and Facebook posts and off the cuff interactions is a unique way of capturing people speaking for themselves that has not really been possible previously in history so I'd be interested to hear all three of you your thoughts on that whether that's something on which you're disagreeing Who'd like to come back on me first? Partly is to do with practice as an historian and a historian of the 19th century in my basic trade I read books and words I am unskilled and intimidated by the prospect and probably too old to retrain myself now to use digital data in its raw form as a basis for my own research I can just please, I mean my children mock me when I say it's not my period when anything comes up that I don't know about in the past and I would want to say the same thing in terms of about the modern digital age but I am interested in privacy over the long run and I have looked not directly but at least at a wide range of studies of the contemporary use of the digital media and the one point I'd make the one open area for me at least is moving across the piece we do need a body of researchers we won't include me who are capable of looking at how given individuals and groups, age groups work across all the media how they use digital media but also how they use older forms of media correspondence, telephones how they talk to each other how often they walk about in complete silence and that kind of breadth of work is a huge amount of work focusing directly on the use of digital media but it's the spread of skills and practices is still rather outside the current research frame So, Helen, is this then a case of looking under the lampost that you study the social media content because that's what is there Well, I guess the argument that I was making was that it's what's there so you need the lamp you need the lampost I mean, because that is that is politics, I mean that was the point I was trying to make I think you can't sort of not look at it So it's not just that that's where the light is so that actually because that's where the light is that's where what you want to study is happening Yes, because actually there isn't a lampost Well, there's a lampost over Twitter as we both said because that's relatively easy to look at that but I mean really actually looking at political behaviour is actually in many ways much more difficult now looking at digital political behaviour is in many ways actually more difficult than looking at other sorts of behaviour just because of the problems I mentioned with getting hold of data but also the point that David made about skills is really important because I think that is also changing our political landscape the capacity of and I'm not just talking about researchers here I'm talking about political institutions themselves so you could argue that there's all sorts of new data available policy making available for government to make better services to make policies that more accurately reflect people's needs and preferences but in many ways terrified of doing that because of the kind of not just logistical problems getting hold of data but the technical expertise that is required to do these kind of things and that applies also to sort of changing the playing field of electioneering for example so in the last election the Conservatives spent nine times more on Facebook advertising targeted Facebook advertising than the Labour Party will never know Facebook certainly think that made a big difference but will never really know because that's a sort of private world where people encounter an advertisement we don't know who saw the advertisement that's a very very difficult thing to study and in fact it's the big you know you see this in the US and to some extent the UK it's the big parties with the big data operations that are going to win it's almost like a re-centralisation in some ways a kind of counter to the kind of decentralisation that I was talking about at the beginning you've got that kind of counter trend as well so it's changing the playing field this need for expertise so I just don't think there's a very good lamp post So far from studying it because it's readily available you're more like a 21st century Charles Booth trying to go out and find the hidden the hidden conversations I think it's something we've got to do because I mean the census actually I was just talking to a US policy maker just before I came here we were talking about the US census I mean the US census now costs 17 billion apparently the US census and it's every 10 years does that really encapsulate social, political and economic change at the moment do you think I think we've no option but it's kind of broken and they are talking very much about abolishing our decennial census here and using combining other kinds of information and they've been talking about it for many years but it's not an easy thing to do that's the point and I don't think anybody who sort of uses this stuff thinks that it's easy In the 19th century they didn't understand the sampling and that's why the census was set up in the form that it was now they do and there are other alternatives but it is going to be a very difficult transition to manage and partly for those very reasons that you're raising of trust and privacy because people tend to be quite honest with the census because it's seen as separate from other kinds of information whereas there was I think in the era of the poll tax which most of you are probably too young to remember but there were a lot of people who did not fill in the census because they were afraid it would be used to track them down and make them pay the controversial poll tax so anyway I'm in danger of letting my own interests outrun so as an archivist then are you in a position to be able to help researchers to get access to people's voices conversations as a more broad based resource I mean this is also obviously you have your church in San Francisco but obviously in this very building the Alan Turing Institute and the British Library are engaged in very similar archiving practices so what kind of things can you do to give researchers more access It's a good question especially because there are technical research and cost hurdles to big data analysis with social media data so I think part of it is educational and community building we've certainly seen the digital humanities and the Alan Turing Institute and others like it in many disciplines emerge as people become more fluent in digital tools but those tools don't necessarily and with archival are library practice and those practices and theories and approaches that were used to collect the data or to describe it or to give context to it that would be useful for research so I think some of it is a traditional support and community building and reference liaison kind of work but also people are collecting a lot of data social media data as part of their research activities and those don't always end up with preservation institutions so having a more cooperative acquisition relationship or donation however you want to define it where people that are using social media data are making sure that it is preserved and all the context that they've given it in their research can help facilitate its use by other researchers so it is the archival job to sort of collect and make available and describe and make useful so social media data falls right into that bucket but introduces a lot of challenges around the technology and the training so it won't be a case of all the librarians are made redundant it will just be a case of the librarians to be the ones who can call up the data and interpret it statistically and historians become redundant I was required to have one provocative comment so I had to get it out of the way We may be but just a plug for the internet archive it's doing the really valuable work that Jefferson is talking about it's stock of texts old fashioned printed texts is remarkable and wonderfully accessible and my practical job as an historian over the last five years changed radically and it may one day empty out the British Library No, they're one of our best partners Fewer people, the footfall is declining I think and certainly because so much now is available online and thanks to the work of Jefferson and his colleagues Can I cut it here before we have infighting between the British Library I think the British Library are also doing a lot of digital archiving I should say this that it's not only in the church in San Francisco, there's probably a big cellar next door that's also full of ones and zeros That's how it works, right? Helen, could I just add to that point Social media have had a lot of bad press for being unable to forget and in fact my colleague Victor Maya Schoenberger wrote quite a well-known book called Delete about how we should sort of have compulsory cell by days on things and that social media doesn't let us forget but there is another sense I think in which social media makes it all too easy for political institutions in particular to forget things they don't want to remember so I would like to point to one particularly brilliant use of the way back machine so after the Brexit vote on the leave campaign actually tried to delete their website including all images of that bus with the 350 million on the site and with the help of some avid Twitter users and the way back machine it was restored and it's there now if you want to take a look I think the bus now says 150 million somehow they managed to actually change the amount on the image of the side of the bus but it is there there's an important point there about how archives can provide a kind of institutional memory if you like and we've got to think quite carefully about how to do that because as you pointed out I mean it's impossible it's impossible there's 1.7 billion people on Facebook okay it's just impossible to think about it apart from all the legal and ethical and logistical problems that in the sense of providing a kind of political and social memory of things it's between the people who study those things this is data in the wild and we're not used to data in the wild the census data is this carefully collected tamed, groomed data tidy and organised and we're talking about wild savage data totally free across the spreadsheets free you know uncallated, untamed if you like we've got to think of the right level of sort of taming into the archive well that was actually my other question was is there a danger that we have the illusion of completeness which is often the case with big data that you go well n equals all and we have everything and we have the whole of the internet although apparently don't you have everything and that you somehow think that you therefore have all the thoughts of everybody and that by doing a sentiment analysis on Twitter that will tell you what the world is feeling about something at a given time but as you say there's actually not that many people on Twitter it's probably us in this room and some journalists but do we not all slightly subscribe to that thing of because it's so big because there's so many people we kind of think well that represents everybody then is there a danger of that and how do we avoid that we have to think what the point is so I could have here shown an image some very very clever engineers a university engineers have devised something called the Twitter mood and there's a face that takes in it sucks in the whole of the Twitter fire hose and it moves you know and it's the mood of the nation and it's really really clever but is there any point in it I mean really you know it's everything but until you think about what you want to what the interesting questions are to ask about that then you know it may not be worth that much on a positive note you know arguments are always circumscribed by the source data that they work with so I mean before the digital era people made arguments and wrote books and did all sorts of you know speculation about what societies were like based on very limited amounts of information that often represented very very narrow segments of society so I think that case has always existed it just feels more it evokes more anxiety now because you're more aware of the limited nature of it because we're all so awash in data and oh my god so I don't think the situation has changed but I think the sort of self-awareness around it seems to yes have changed In the failure of the political science community to predict any of the major plans of the last 18 months would suggest that it's not actually that easy to manage this information I'd just make two points one to kind of reiterate what I was saying in my paper the Facebook users are heavily concerned with their emotional lives and they know that what's on Facebook is a construction of those lives not the full scope of its reality both in terms of what's on Facebook and also in terms of all the other communications that they engage in outside of the digital media so that is I think a real caution a second caution is about generation to an extraordinary extent we are focused now on young men who have been between Asia about 14 and 19 now it's not the first time in history by any means that what we now call adolescents are seen to embody everything that's problematic in our own society and are seen to predict what's going to happen in the future but it ain't necessarily so most of the people in this audience I would imagine were devoted their grown up lives to trying not to behave as they did when they were 15 particularly in their emotional and sexual lives that's the point of being grown up and we don't yet know all that much about how digitally infused 60 year olds grow up to be 30 or 40 or 50 year old citizens we need longitudinal data which is just about unattainable I think but I remember seeing a Pew survey which asked the question of people in long established relationships what parts in those relationships do digital media play and unsurprisingly it just declined off a cliff over age so a time you got to people in their 60s the digital media are completely irrelevant to how they were managing their private lives so you wouldn't expect otherwise really building all of our knowledge about how the world isn't going to change on the basis of this very small cohort of the population now I would love to just keep these guys talking for the next 40 minutes but that's not my job thank you for being so patient I'm now going to come out and hear what you have to say am I right I think we have a roving microphone is that it's coming down the roving microphone person so please wait until the microphone comes to you because we are recording this for posterity as I say if you're not meant to be here put on a different accent in order to try and get through as many of you as possible I will take a couple of points at a time try and keep them concise and then I will let our panel respond stick your hand up if you would like to say something if they go well let's take you two to start with them and then we'll come over to this side hi thank you very much for this really interesting discussion my name is Bettina Friedrich I currently work at UCL and my background is clinical psychology and media psychology so I'm very interested in this topic and I want to follow up on a point you just said about what people say on Facebook doesn't always represent reality but how they want to present reality so clinical psychology is also interested in personality this is particularly interesting for us actually there is the expression Facebook happy which means people are not actually happy but they use Facebook to portray a life that is really happy because we are all supposed to be happy and enjoying life this is like something for the statues and so it's really interesting how people try to represent themselves on Facebook for psychologists this can be a really great source for research but I agree that if you are historically interested what the reality really is or has been this can be tricky because it's subjective very useful point thank you my name is Jonathan Adams I work for a company called Digital Science which is based with Macmillan just the other side of King's Cross from here and we are a portfolio company so one of our companies is Altmetric.com which makes use of social media data in analysing what academics and others are tweeting and blogging and so on but in the context of this debate there are two questions I wanted to ask one was social media data where are you drawing a boundary between social media and any other form of data because actually in looking at the use of social media data you're almost certainly going to bring in a wide range of other data and actually there's a very grey area between what is social media data and what is another form of data so actually the use of bicycles in London is not obviously social media but actually it's a social activity and you can get a huge amount of information out of what people in London are doing down to a very personal level from analysing those data and you can tie it in with other things and the second question is that I felt perhaps that the debate has been a little bit on the negative side about problems with social media data and hasn't really emphasised utility and given examples of use and thought through all the positive things that can actually come out of being able to use very big data sets which then provide a range of information on transport use environment, other activities OK, thank you well, since this gentleman managed to squeeze in a company plug in two questions which I don't encourage I shall let the panel come back now if you like and then I'll come out and get some more I think we have some over that side if you'd like to ready yourself with a microphone up that side so panellist, feel free to come back on any or all of those points so we have the, can we hear some more examples of good uses of social media data we have the problem of linking social media and other kinds of data and where you draw the line which we haven't really raised at all yet and the fact that people don't necessarily give a completely accurate portrayal of themselves but that that in itself can be interesting you'd like to start us off yeah, well I always try not to be too over excited I mean I'm actually very excited about data of all kinds because I'm a political scientist and the whole discipline of political science existed in a sort of data desert kind of since it started so I am very excited about new sources of data and working hard to think of ways one of the things we've done for example is a feasibility study for the department of work and pensions to work out all the ways that social media data might show how benefits change was playing out in society you know, how are people reacting to the move towards universal credit so it was only a small piece of work a feasibility study but I am very excited about the possibility of policymakers using social media data to understand how policies are being experienced and how better information might be given about them and all sorts of things and there's all sorts of examples of that I think there are specific, I mean the topic was social media data and you're right that there's a grey area and I draw the line quite widely a lot of the analysis in our book for example came from petitions data I believe that a sort of petitions platform where you have the opportunity to very easily sign a digital petition is to all in purposes like it's user generated content you're consciously generating data yourself which you hope will contribute to some kind of public or social good so I do see that as social media data I think when you get to transactional data where people's participation is completely kind of it's not explicit at all it's completely passive so it's data about people I'm not convinced that that would necessarily come into I think there's distinct questions about social media data as opposed to data where it's transactional data about riding bicycles and you're generating it without thinking about it or even knowing perhaps that you're doing it it's still interesting in some ways more interesting, there are more things you can do with it social media data I think is to lose some of the kind of dilemmas there are about how you use social media data and logistically how you use it and ethically how you can use it which we've only talked about previously we haven't touched on all the other questions about social media so I think you should still kind of maybe try and make a distinction between this kind of active social media and kind of passive data and maybe you think that's too much of a question I mean I would in backing up the positives and the utility just speak to the ability of social media be it data or records, there are all sort of records to represent marginalized and underrepresented communities that in the historical perspective have not been well represented as far as their lives and activities think of how political revolution would be documented in the historical record from 100 years ago it would be largely official statements or records or police records, the actual person on the street documents that capture that spirit or that activity or existence are largely missing from the archive of a century ago so social media really allows us to change what the composition of the archive is and thus how the historical record and the sociocultural record can be represented and studied in the future so I mean that's I think an amazing positive is just the plurality of representation that's possible at an item level or even full movements or communities or societies So the revolution will not be Instagrammed No, it will be, and we will archive it Revolution will be Instagrammed Has been, David I mean the printed word has left more of a wider record than perhaps I would suggest about how the people were thinking but to go back to counting one of the virtues of standing where I stand in the 19th century you can see the beginning of it and I've written a bit about literacy and when they discovered that across a society instead of there being an impossibly wide range of different people who could or couldn't write in different parts of the country in different families in different streets and discovered that if they reexamined marriage rates to data they could express the communication capacity of society on one sheet of paper covering an entire population it was a stunning moment they couldn't believe that this was possible they ran the data through three or four years and it turned out to be stable and to be moving very slightly in one direction that they realised that they could do something cognitively and in terms of understanding the society which had been completely impossible beforehand The post when they had Penny Post in 1940 that created cheap post but they also at that point began to count letters never been possible before and you get this extraordinary organisation the Universal Postal Union in 1975 which begins to publish tables of the numbers of letters and post offices and letter carriers all the way around the world and by 1914 you've got the entire world mapped this was a revolution in understanding how societies worked and talked to each other which in some ways against what had been there before the modern digital media produced although that in turn has been really important so to answer your point I think we can lose sight of the scale of the change that big data has made going back to the 19th century in our capacity to understand the world in which we live and how people behave in it OK I'm going to take some more points and questions there was OK yes and I think there was another in just in front of you and meanwhile we might take this guy as well and then we'll go one, two, three there you go I work in marketing and my point is going to be biased by that the data that gets collected with a lot of social media platforms was formed to to make more money for them and is there a challenge that the metrics that you get for analysis from a research point of view were put in place from a marketing point of view and are there data points that you would like to see that aren't currently collected or are there ones that you feel people pay too much attention to that don't really fit with your research point of view because what seems to be the case in social media is people can find data to support whatever argument they want and there seems to be a sort of post big data world where people seem to be disregarding things that don't fit them because they can always find a point or a statistic that does and I just wanted how that sort of mix between marketing and research works for you because it seems to be quite different from any other kind of data points that you would collect historically Thank you so you're collecting data for one purpose and trying to use it for another Gentleman over here Hello, I'm a programme worker in banking and I want to highlight the aspect of social media it's possible that social media will cause different person to get treated differently in public services for instance I was told that in certain continents it's now possible that you can get a loan based on your social media profile and on the other hand we were told that for instance you apply for a certain visa into a certain country you provide your social media link as well so I just want to highlight in this aspect when you give when your social media presence is going to influence how you are being treated by banks, governments or something is what's the impact of it is it like a chance that less resource classes can easily get financed but on the other hand when a certain candidate comes on to power it might get easier for him to implement certain segregation policies as you say Very good point, unforeseen consequences and you're right that there are people who get loans because of their social media because that is their public record because they don't have financial history and credit history and you're also right that things like rhythms or even human HR people may go online and look at your social history and make judgments about you so yeah we haven't really covered that so if you'd all like to answer that as well if you're just making a note of that and gentlemen over here The big surprising political changes of the last couple of years and they have been a few have had a lot to do with social media so Vote Leave and Momentum within Labour Party and Bernie Sanders in the US the supporters are all organizing using social media and then it's a big ground up movement there's a lot of noise there and it's difficult to see the real popular appeal of these things are there things that politicians can do to cut through the noise of social media and find out what's really going on within those people Thank you so again we've got three points so I'll come back to the panel and that you respond to whatever you fancy responding to The politics question the problem of trying to use media collected for commercial purposes for research and the much bigger question of well what consequences might your social media postings have that you haven't really thought about that sounds like it might be one for you David I don't know if you fancy tackling that or is it not your period two points one is actually a historical point people used to get jobs previous positions because of who they were related to which was a kind of social networking question but they were different order and then we assumed that we could dump that and get there by examination on some kind of meritocracy and I don't understand this the changes taking place but as described it sounds like trying to do the same thing all over again but with data which has been collected for completely different purposes and I think it is something that should be should be worried about The marketing point well that's so good do we live in a post big data world that is a really good point in the marketing in the marketing world you've been much better or luckier or richer in terms of getting hold of this sort of data and often the kind of data that's useful for marketing would not be particularly useful for analysing political behaviour and perhaps that's part of my not my pessimism but my pessimism about getting hold of data because it can be incredibly expensive obviously there's a trade there's a payoff if you're using it for marketing but in research terms it can be extraordinary expensive I think the real thing is that there's a big shortage of data that can be used for public good activities as opposed to it's all geared towards profitability and it's much more difficult to get hold of data that can be used for public good purposes we were talking about this a little before we came in I mean you can donate blood but you can't donate your data and is there an argument for actually creating a capacity for being able to donate social data because there might be all sorts of social problems that we could tackle with that you know bullying teenage depression obesity all sorts of problems that might be tackle if you had the right sort of data and if people were somehow able to donate it in a sort of socio bank or something like that so I think we've got to get a lot more imaginative about thinking of ways to do that and also of course linking data to other sorts of data we're really in our infancy in being able to link more traditional sorts of data like survey data to kind of transactional data or social media data there are ways we're starting to develop ways and if you think part of the Alan Turing Institute is to develop new ways of using data and using data to carry out public good activities and to derive insight into social and political behaviour and I can't remember the last question but it was about politics it was about how you cut through the noise of social media and get to the big political movements that are happening there I suppose in part it's the same question that the data is not created such data is created from social media organisations is not created for that purpose and you get the same sort of gatekeeper problem with the big political parties and that's having some quite pathological effects though in the last election for example both the two of the big data scientists from the US democratic campaign Jim Messina David Axelrod came over worked for the Conservative and Labour parties the Conservative party ran a really great big data campaign and then they were tweeting bye what happened what a waste of all that insight into how people behave and what they wanted and what their preferences were and I think you see this a lot parties campaigning with data but then you don't see them governing with data afterwards so Obama really pioneered the kind of big data campaign and the micro donation the phenomenon of micro donations ordinary people giving little bits of money taking money out of politics but then when it came to governing it wasn't translated into the federal administration in ways that it might have been to kind of make better public policies is that Mario Cuomo quote isn't it that people politicians campaign in poetry and governing prose now it's the other way around perhaps now they campaign with data it's going on a hunch but isn't part of the problem there precisely this kind of asymmetry of access to information that the political parties know a lot about us and they are in a position to amass lots of data about us in the course of researching my book just to slip another really subtle plug in I read about this software company which all the political parties used in the last campaign and they offered this service where if somebody gives you their email address like I think the Labour Party had a really cunning one where you could give them your email address and find out which other electors have the same name as you I didn't bother but once they've got the email address they can then link up all their interactions with you but also all your social media activity so they basically have a page for you where they can see what you've said so they know how to interact with you what you're interested in so I looked at the website of this company I just looked at it I didn't sign up for anything I looked at the tweets of this software company saying hey you know why don't you come and see what we can do for you that was just having looked at their website so there's this massive imbalance between what the political organisations can know about us and what we know about what they're doing and what data they have access to and what information they're going on and so on so what can we do about that and if they're basically using proprietary tools presumably having agreements with with the giants who do own the social media information how can we redress that balance a bit and know more about the people who know more about us well yes I suppose a couple of things I mean I think the younger user sorry to mention this cohort that you're not keen of I mean they do know they do know I think they're much smarter and I think we're all getting as a society we're getting smarter about those kind of things you know we're getting better at recognising when something's a bot and when we're being plugged something and that some of the naivety about that we'll kind of we'll get smarter over time we won't maybe completely you know you get all we get all these spirals where everybody's playing catch up but we will get better of being able to tell those sorts of things and to avoid them I think young people have much more sophisticated ideas about privacy than we give them credit for and that's why most of them have two identities on Facebook by the way one that was when they agreed to be friends with their parents in order to get the Facebook account and the other one was the one they actually used they've all left Facebook now by the way so they don't bother trying to look for them there they're on Snapchat well as I said five platforms one of which is probably Snapchat right now so you know I think we're getting smarter about that kind of thing and that's what's exciting here because the internet and particularly social media is the first technology that citizens have domesticated and are innovating and are generating content and are innovating with faster possibly than mainstream traditional institutions and governments and corporations and that's the first technology I mean any other information technology citizens didn't really we didn't really sort of use it you know governments had a monopoly on it or corporations or whatever you know like personal computers or mainframe it didn't make any difference to what citizens were doing in their everyday lives and this is the first time so we're at a quite exciting point in that sense and I think we will we will get get smarter and also of course it does we shouldn't underestimate the extent to which social media allow us to shed a light on what organisations are doing and allow us to challenge what organisations I mean if you you know the revolution was tweeted um there's all sorts of things that are problematic about that sort of mobilisation and it's not very stable and it's unpredictable and it you know it lacks institutions and leaders and there's all kinds of things we have to think about there but it it was you know we did we have had major challenges to authoritarian regimes and some kind of challenge to particular political institutions of all kinds in almost every country in the world so we shouldn't underestimate that and just to provide a different perspective on the asymmetry imbalance twitter has told me more than I would ever want to know about Donald Trump so it can work both ways hopefully but it is interesting to think of the tension between social media as a form of expression and as a commercial enterprise and I do think Helen is right in the people have greater awareness that it is both being co-opted for purposes that are outside of what they're intending to use it for but it also does enable them to use it for that thing and I just assume that society and culture will figure out the balance there for how we deal with that in the longer term right I think we have time to come out for one more round of questions so I will just see you always do this audience as a repetitive sample of all audiences you always wait and then put your hands at the last minute okay so what I'm going to do I'm going to try and take everybody's question so we're all going to try and hold those in our minds and then I'm going to give the panel basically their last chance to come back and answer everything which they won't be able to do of course so let us start up there with the person in the red dress I think and then zigzag our way down so put your hands up properly so I can see where you are okay so we're going to come down that side so why don't you start at the front here and work up and you work down okay hello I'm a 30 year old I've been a full-time Facebook user for 10 years and a part-time Twitter user for four my question's about trust and picks up on what's already been mentioned a couple of months ago I discovered via Facebook that the Labour Party had declined membership to a few people who had previously tweeted support for the Green Party I was wondering if the panel thinks that's ethically right and I'm also interested in the fact that those people who were trying to become members of the Labour Party called out the Labour Party on this via Facebook and I'm just interested as to whether you think we as Facebook users should change or be more aware of our behaviour or whether we should lobby the Labour Party to change their behaviour and their attitudes very good question capturing both sides of the social media and politics debate okay so come down here and stick your hands up again at the back so we can see who's next hi you mentioned that the revolution will be Instagram the revolution will be Twitter but social media now has got more restrictive even though there's a lot of political activism when there's a big political event social media is blocked people can't really access the latest information since around mid 2013 when you refresh your Facebook feed you get an older post the more you refresh it the older the post to come up on your news feed doesn't give you the latest posts and that's kind of coincided with the Arab Spring and bigger revolutions and protest movements and the fact that all this media is monitored by you know not so democratic states do you think there's going to be another platform kind of come out that would be kind of more cater for political activism and kind of social movements another very good point and of course in this country people do get arrested and charged for tweeting jokes so we maybe shouldn't be too sanguine about it okay so I think someone's got the microphone there and then we'll come to you good hello I'm an employee of the British Library and also a millennial with at least six or seven social media accounts the British Library Act of 1975 was created to ensure that anything published in the UK a cock-up he was sent to the British Library to be retained as many hurdles as there would be to do it do you think there is the justification for archival and research purposes to have some sort of similar legal deposit obligation for collecting social media data or do you think that private individuals private companies should retain the right to keep this information in an age where it's a commercial commodity very good question so how published is social media and should it be archived so let's see more hands at the back so we can get the microphone to the next person then meanwhile you speak my question is a bit different I'm speaking from a research funder perspective and wondering what the panel's view is on who should be responsible for the training of the people who are analysing social media data so should it be the responsibility of research funders or institutions or of the social media companies themselves given that a loss of the researchers who do have the skills that you were talking about before move into industry rather than staying in academia should there be more responsibility on research funders and institutions to try and retain those people for academia wow good question who's responsible for training the researchers to use the new data last chance to put your hands up so I can get you on the list do you go ahead so I have two related questions one is anonymised social media data and oxymoron does it actually exist and is it useful and then two not quite the right to privacy but what do you think about the right to be forgotten so some kind of time horizon or time event for when your data may become less personal or less accessible so almost two opposite questions how anonymous should it be and should there be a time when it disappears altogether okay this is your absolute last chance by the way there are only 15 to 19 year olds in the room and you want to speak this will go on the internet so it does officially exist nope okay so panel you have I don't know three or four minutes each if you'd like to basically combine summing up with addressing any of those points that you'd like to as always people have come up with the really big interesting enormous questions in the last 20 minutes which is always the way but I think the good news is that there is wine outside so we can do the usual thing of completely informally and not on the public record chew over the really difficult points over a glass of wine so let me take you in can I take you in reverse order is that a mean trick oh that means I have to speak yes that means you have to go first those were good questions for whether the Labour Green Party issue I is this all a British politics which is bad for me first I don't know what those are but I think the agitation advocacy is I mean what we've seen in other social media like use of social media and the ramification the sort of exposing it and advocating against it is a far more effective measure and far more at least democratic in spirit than the sort of punitive approach so that's my take on that one fight politics with politics not with technology yes the legal deposit question was interesting so legal deposit is the right to acquire stuff in your country and I'm not sure about that one so we do not have legal deposit in the United States like they have in the UK and in many other European countries so there are we do have records retention acts and things for the National Archive to collect agency agency material but they actually don't collect much social media a little of the congressional people in congress on social media accounts they are usually considered official government records as long as they are created while they're in office and used for their official purposes and things like that so I do think legal deposit has been really successful in all the European countries for allowing acquisition of content that might actually you know fall out of other collecting activities it's also of course given libraries and archives lots of funding to go out and get that stuff so that's a good argument too and I would say that social media is like other web published materials and should be included in those activities training researchers is a fantastic question and it's one that we certainly come up against because people are always very interested in using the web archive and data at scale and it has all sorts of infrastructure and technical fluency and challenges and where the support lies for those funders would be great I've got plenty of grant proposals if you're accepting them currently but it is I think it is driven generally by disciplinary communities and academic networks more than the libraries and archives which have much limited support for doing that sort of training they're more focused on making it accessible they try to help with but certainly funding is usually community focused and that does often come from funders or the government anonymized social media data so I've been spending a lot of time in the geo cities web archive collection lately and I have no idea who any of these people are so does it become anonymized over time sort of but that doesn't really answer people's anxiety about being collected in a sort of non permission approach I think some twitter data and there are ways to provide access to social media and web published data in an anonymized fashion which is links and network graphs and semantic entities and things like that that don't necessarily link back to the individual tweet or artifact and yeah I think probably in there I think on the public record thing I believe that I mean you work for the British Library you probably know more than I do but I think some social media is probably archived as part of the web archiving box the UK National Archives UK National Archives collects all the Twitter and YouTube of government agencies just politicians so public figures all right okay there you go so would it be legal to make social media companies provide some subset from what is created within the country domain I don't know of any that do but I think it would be awesome if that happened or would there be privacy questions would you have to respect the original privacy settings of the users you could have embargoes and researchers could use it but not somebody's mum for example not the green party not the green party exactly okay sorry I'm chipping in and overrunning David would you like to give us your assorted thoughts no it's my fault and none of my social media is not an oxymorom there are a whole range of techniques which have been developed as we know we also know that it's dangerous to invest unthinking confidence in them and the particular point that's come to mind is the NHS records whereas the attempt to sell on anonymised health records to private companies has run into every sort of trouble and rightly so on the issue of deleting the past is it a very complex question historians are bound to get very itchy about this if they see societies or individuals being able to correct or change their past particularly if more and more of their present is embodied in the social media this is counter to what any historian wants to see happening unless you're actually changing errors in your past I think the question isn't whether it should be deleted but how it should be used and to connect that to the question about the Labour Party disqualifying new members because they've supported the greens in the past any political party has got a right and as always I think denied membership to people who at that same time are members of another party or very actively supporting that other party that's contentious but to deny membership because in the past they may have supported some other party which appears to be what is now happening is an extraordinarily dangerous and foolish course to take Winston Churchill after all was a Liberal Party Cabinet Minister for a long time before he became the world's greatest Conservative Prime Minister so people have got an entire right to change their views and to use some past social media tweet to disqualify them for their present political activity is just fundamentally wrong Is that your closing words for the evening? Excellent Finishing positive If you could answer all the other questions and sum up within 6 minutes that would be perfect No problem There are really good questions and I can't answer all of them in that time The training point, well yes that is a really good point at the Oxford Internet Institute we do train, we have a master's degree in social science of the internet and we do train people to use this data, to gather this data et cetera a very large number of our post graduates then go on to work for Google or Facebook or these companies and lots of them go into academia too I mean I do think and at the Alan Turing Institute upstairs there will be 40 PhD students I think starting in October and they will have very advanced to be developing very advanced skills in data science so some of those students that I'm talking about will have research council funding some of them won't I think obviously I think that more research council funding is needed for this kind of training of course how could I possibly say anything else but I do think also that the point you make about social media companies maybe we should be looking it's no good creating a sort of us and them well there is a bit of an us and them environment but we should be working to breaking that down and we should be looking for partnerships in funding and apprenticeships and all sorts of things like that and most university departments that do the kind of thing we do have a relationship with Google but we need to do more of that and we need to get them participating in funding that training because absolutely it is the universities that do it so it was a really good question the point about the revolution and platforms being blocked and being dangerous for people to put it all those things of course are absolutely true and they were true through all the revolutions of the Arab Spring and the people who took the first steps were not undertaking tiny acts of participation they were large ones at the beginning when they were in great danger of being identified I mean the only I mean yes you are right of course in an ideal world there would be this platform that was kind of secure and impenetrable and the authoritarian regime not going to happen I mean whenever you build you can't these things have grown out of platforms where people are and where people are have social networks and are communicating and coordinating with each other and they won't if you make a place for them to do that they won't go there and that's the problem where we are people will innovate and they will draw in resources from other countries and there's a whole culture of sort of hacking platforms to give secure coordination mechanisms to people in authoritarian regimes so yes of course it's a challenge it's a challenge that people will continue to strive to meet and I don't although there is the tour underground network and that is one possibility where people can organise away from scrutiny I think if you're looking for a platform where you want to get a million people on the street you've got to go to where there are millions of people that's that is part of the challenge yes yeah oh right okay yes well I mean yes and that involves lobbying social media networks obviously not to do that and there are lots of people doing that and we've got to do more of that I mean yeah I couldn't agree more the point about the Labour Party I think that illustrates very well the question of how traditional political institutions are dealing with this kind of politics which is i.e. very badly so what you've got in both with the Labour Party and the US Republican Party you've got a number of parallels there they've both had leaders or in the US candidate elected on waves of support you know kind of bubbling up out of social media reflecting the way people do politics now which is they have multiple allegiances the whole idea of membership as it used to be I think is kind of dead people have multiple allegiances for multiple parties and the traditional political institutions just can't cope so just because you tweeted that wonderful Green Party didn't we all tweeted that Green Party advert with you know the four leaders all the same singing in a boy band didn't we all tweet that I think that would have been enough to disqualify you from voting that is ludicrous and that implies a misunderstanding both of how politics works and also a misunderstanding of social media because you don't tweet something you know just because you you belong to it it's just not how it works so I think that illustrates very nicely how traditional political organisations have got to come to terms with this as well well on that slightly incomplete note which I think is apt for such a massive question we are going to adjourn and move outside where there will be drinks and informal conversation on smaller platforms by which I mean people talking to each other thank you all very much for coming and for contributing some really good points and questions and before we go out I'd like you to thank not only our three excellent speakers but also Maya and all the other people who organised this event which as you said is the first of a series do you know yet when the next one is no we don't know yet on the date of the next one so I guess you'll have to follow British Library on social media to find out so on that note please thank our three speakers David Helen