 I'm going to talk a little bit about ethics and social media research, just to give an overview and it's really complicated I guess is a reasonable starting point and there are no single right answers and just as Leslie was talking about there being people over here and people over here and their attitudes to how social media data can be used and how valuable that research is, there's also this huge spectrum of very very different views around what are the ethics of social media research. On the one hand you've got people who are typically from more computer programming and big data analytics backgrounds who feel like Twitter social media data are publicly available and you can pretty much do whatever you like with it and then you've got people right at the other end, typically from the kind of backgrounds that we have or I have which is sort of social science, social research backgrounds where we think actually hang on, where's the consent for this, where's the anonymity so on and so forth and actually you're really scared and quite off put or put off about using social media data and again I think the kind of conclusion that Leslie was coming to was that actually there's this middle ground and we're trying to sort of feel our way to what that is. So what I'm not going to do is give a load of answers about ethics and social media research but what I'll try and do is raise some questions and I think that's for my personal opinion that's the best way forward is when you're conducting research, if you're conducting research with social media data, it's thinking about your particular project, your particular circumstances and your particular data and what are the particular issues related to that. So that's sort of the conclusion of what I'm going to talk about and I'll go into a little bit of detail. The first thing I'd like to mention is and the reason I'm talking about this is I lead something called the new social media, new social science network which was originally an NCRM funded network about four or five years ago. They funded it for a year but then since then that's then National Centre for Social Research is where I work. We have taken it on and we now sort of internally fund it and manage it as sort of part of our ongoing charitable remit for methodological development and we have Twitter handle and a hashtag a blog and we run various events throughout the year. So if you're interested in social media research do follow us do engage with us. It's a really useful network and worthwhile following. So why did we end up looking into ethics? So one of the key issues that came out of our network events we ran early on when we were first looking up was that regardless what we were talking about, whether we were talking about tools for social media analysis, how actually useful it is, whatever we spoke about people were always raising concerns about ethics and more specifically expressing concerns about what they saw as a lack of guidance on how to deal with some of the very very specific ethical issues raised by this new methodological area. Actually what we did we surveyed our members and we found that only around a third felt that the current guidelines that they had were up to date and adequate for the kind of research that they were trying to do. So the existing framework, the stuff that was over here in my framework, the stuff that social researchers have been using for years and years and years just wasn't answering the questions, just wasn't giving them the guidance that they needed. But clearly many of the issues raised by social research are actually addressed by those ethical guidelines. The basic concepts are still there. We still want to protect our respondents. We still want to make sure we're making the most of our data. We still want to maintain anonymity where we can. We still want to do all those kinds of things. But what's different is that the characteristics of social media in terms of how they mediate the relationship between the researcher and the person who's being researched really create unique challenges that have not really been previously covered. So quite like this quote is that online research presents new ethical problems but specifically it recast old ones in new forms and new guises. But those issues are going to really, really vary by project. So what the subject you're focusing on, so are you doing research into sort of domestic violence is going to have really, really, really different sort of ethical context and if you're doing something on riots or if you're doing something on online hate or you're doing something on how people travel or tourism or so on so forth. So that's going to vary things hugely. But also beyond that, who's your target population? People online, people on Facebook could be aged 11, 12, 13. Are you trying to research that group of people? It's very, very different context if you're trying to research an adult population. The platform that you're using massively changes the context. Twitter and we talk about Twitter a huge amount and too much when we talk about social media research because it is open. It's because it's relatively easy to access. But actually if you want to start looking at things like web forums or Facebook is also still quite hard or all the multitude of other social media platforms that exist, again it changes the context and it changes the methodologies that you're going to use to do that. So for example, observing discussions in an online forum for cancer patients is something that I've seen people use social media methods for and that's really, really sensitive and that's got real particular ethical issues. On the other hand, someone using Facebook just as a snowball recruitment method is a completely different research context and has very, very, very different questions. A further complexity to that is that social media are constantly evolving with new sites or new applications being created but actually even with the ones that exist new features coming up and new terms and conditions as well. So that which will change the way you're allowed to use that data and we sort of earlier touched upon the difference between the legality and ethics but they do sort of intersect and move apart as well and it's something that sort of falls into this discussion. So for example, when somebody signed up to Twitter sort of however many years ago, I don't know if there were terms and conditions when they were doing text messaging but a little bit further along. When you agree to those terms and conditions, you're agreeing to a certain set of data, certain type of data being shared but since then you've added capabilities like sharing video, sharing images, now a larger amount of text, links, geolocation, etc. When you agree to those terms and conditions, that's not necessarily the same thing that you're using the social media data for now or the social media site for now. All of this variety in terms of the platforms and all this variety in terms of how things change just make it particularly difficult to prescribe guidance. One size does invariably not fit all for how we should be approaching research on social media in an ethical manner. That said, despite the relatively novel and dynamic landscape of social media research, headway has been made and I think as a social research community it's becoming more and more and more engaged with social media data as a potential research tool. These various issues are being better identified and being better understood and therefore being better addressed. There are more and more case studies and more and more examples of how research have been done and even if those case studies, even if those examples weren't perfect, at least it's a method of saying in a transparent way we made these mistakes if we did this again, this is how we would improve it. In that sort of standing on the shoulders of giants kind of way, we're getting better and better about thinking about how we should be operating ethically. Just to give a few examples, while it's not quite the wild west it was a few years ago, there's still a lot of work to be done and as I said the evolving nature may mean that the ethical work on social media research will never be done because what we're thinking about will constantly be changing. So it's therefore important that researchers are aware of the possible issues of conducting research while using social media so they can adapt their methodology in a more reflexive manner which maximises the potential insight of that research. I think it's really really important when we talk about ethics, we're not just thinking about it in terms of protection and minimising harm and risk but also thinking about how can we maximise the actual research value of this data within that framework, within that context. What I'll do is just run through some of these areas just to give an idea and insight into some of the types of issues that have been raised by researchers as part of our network over the past few years. This is by no means exhaustive, not every point will apply to every kind of study but the idea is I'll just give you a flavour of some of the kinds of issues that we want to address. So one key debate is whether social media platforms count as a public space or a private space and this has some legal implications to start off with so whether the Data Protection Act applies to the data that you're collecting or not and is defined by whether those data are public or not and also as are the ethics of whether it's okay to collect this data passively. And again it's not consistent in all context, just not all social media data are public and not all social media data are private. Different types of sites, some sites are more public than others and actually even within platforms or within particular social media platforms you can make different types of arguments for different types of data. So if somebody on Twitter makes their account private you might interpret that as them saying that actually this is private data and it's not publicly available. Is somebody's open page on a Facebook site similar to a group? Is a public group different to a private group? Actually as soon as you, and this is what I mean when it gets quite complicated and you have to be very very specific about the place of work that you're doing. One way that I like to think about this is in terms of the expectation of being observed. So is it likely that a user would expect their posts, expect their content to be viewed outside of the members of that group or their followers or friends or those in their local area or those people who've swiped right on on Tinder, whatever. Who are they expecting to have their data viewed by? Actually that adds even further complexity because when a lot of researchers use Twitter data, when someone sends out a tweet they actually really expect it to be looked at by their followers. The API, the open data actually means that it is still publicly available to everyone and there's this real disconnect between what users might necessarily expect and what the terms and conditions say and what other people think and what researchers might want them to be thinking when they say that. Just as earlier when I was talking about how people using these data can have quite a wide range of opinions on what is end, what is not ethical in this context, actually users and social media users have a really really varied perspective and really really varied ideas of what's okay and what's appropriate for their data to be used by. So about two, three years ago Natsyn did some qualitative research with people who use social media data and we found a huge spectrum and a huge range of different views on, actually I was talking to Mike earlier about people thinking about their admin, how people feel about their administrative data being used by government and on the one hand some people might be really sensitive and say no big brother that's awful but others are kind of surprised what the government isn't already linking all that data together, how inefficient and how terrible. It's actually the same kind of thing for social media data. Some people when you ask them actually feel really private and really defensive and say no I own this data, this is my intellectual property and other people shouldn't be using it but others are saying yeah that's fine, I put it out into the public domain, I completely understand this and I kind of assume that people would be using it already anyway and that just makes it so much harder for us as researchers because we can't change our approaches depending on what the people are using individually because we're trying to understand at an aggregate level but being aware of the fact that there are these range of opinions is important for the decisions that we make. I think in that context one of the things to think about is how we approach the observation ethically and how that might impact behaviour. So we've talked a lot about Twitter data and pulling that in automatically but actually there are circumstances in a smaller qualitative study where a researcher might embed themselves in a forum within a group and in that context should a researcher lurk, should they just stay there, sit there observe and take things in or should they be engaging with participants, do they need to declare themselves as present, do they need to get consent from the other people who are using that forum or perhaps would it be sufficient to get consent from an administrator of that forum. Thinking these things through and again in varying different contexts will change that. Inform consent in social research is sort of one of the basic tenets, one of the key important bits of it. Some people suggest that terms and conditions cover informed consent as they will typically state that the data will be used for research purposes. But do people even read those terms and conditions? I work in social media research and I've never even bothered reading the terms and conditions for a good number. Anyone who signed up to the Wi-Fi in this hotel had to click the terms because anyone read it? So how can we claim that that's informed consent with any sort of reality, any sort of sense of authenticity? And even if they did read those terms and conditions, when they did that did they have any sort of idea of how their data were going to be used? When Leslie was sort of pulling up all of this Twitter information he searched on Brexit and find maybe Jacob Rees-Mogg accepted that when he tweeted about Brexit lots of people were going to see that. But in that data set you also would have had Steve from Dundee sort of saying something slagging off the Tories. But he didn't expect that there would be this room of people in Southampton having a look at that and thinking, oh that's what Steve from Dundee thinks about that particular issue. And again that's in one hand what Steve signed up for, that's what Steve agreed to when he signed the terms and conditions but he certainly didn't think that I'd be seeing that and be talking about it. There wasn't a Steve from Dundee by the way, I made him up. So yeah that's really interesting and really problematises this issue. So we need to think about when is informed consent needed, what level of consent is adequate and how can we be sure that's informed? Is it feasible when we're trying to scrape in millions and millions and millions of tweets? No, it's really not. But maybe we think about how we do it later down the line if we want to publish something using their data, if you want to do something a little bit more rather than analysing it in abstract. But actually if you're working in a smaller sample, when you're working in a forum, when you're working in a Facebook group, actually perhaps there is something a little bit more engaged that you can do. The right to be forgotten, if we do take the idea or principle that posting in a public forum is an acceptable form of consent, then how do we deal with situations where a user deletes their post? Should that be treated as removal of consent? Certainly that's something we would allow within traditional research context. If we ran a survey and then that participant got in contact with us at Natsen and said actually please delete all of my data then we would be obliged to do so. But when you've downloaded hundreds and thousands of tweets, there's no sort of mechanism for me to know that somebody's necessarily deleted that and changed my analysis to reflect that. And actually there are particular requirements and legal requirements for us to do so, but the mechanisms to do that just aren't in place. And to what extent is it the researchers who are responsible for identifying that case, given that participants don't even know that their data have been collected and they're being used for that particular piece of research? Data security and confidentiality, so how do you protect the data securely and confidentially? So you can apply the usual kinds of protection methods, we were talking about de-identification and anonymisation earlier, but social media data is inherently personal, so it is inherently identifiable. You can strip out a Twitter user name, a Twitter handle, but if anyone with Google and access to the raw data text can search that and go straight back to linking to who that individual is. So even when we sort of de-identify, that can all be traced back. And that's a really sort of odd context for an analyst. We aren't used to, and our typical ethical frameworks aren't used to researchers being able to know who individuals are when they're conducting analysis. And actually, I'll talk about it later, it's not necessarily just researchers doing that. If you're taking a data set and putting it out to coders, for example people from Mechanical Turk who'll be sifting the large data sets, what are the data sharing agreements like there when you're trying to get people to look through this data and they know that you're coding stuff on racist tweets or prejudice tweets and they'll be able to see that that individual has said that. What are the ethics of that? And then finally, the output status as well is publication. So for example, Twitter's Terms and Services tell you that when you publish a quote, a tweet, you have to put their hand or you have to put everything in all in its original context, which is the complete opposite of what we do when we typically quote qualitative data and stuff in qualitative interviews. We might paraphrase, we might change random words, but the Terms and Conditions from Twitter tell you that you can't do that. So how do we balance up those two elements, the sort of legal and the ethical side? Data ownership and publication, so to further complicate things. To add to that, does the context of the publication matter? So does it matter whether it's a journal paper, an internal report, a blog post? What might be a user's expectation to the level of viewing of what they're putting out? So if we're happy that nobody actually is going to see this out of a small academic community, is that okay? Actually, if we're putting a big post online, we're writing a journalistic piece where thousands of people might see that Steve from Dundee didn't really care for Brexit, then actually a completely different context. There's a fundamental question about who owns the data. Can a researcher have ownership of data which is produced for those non-research purposes? If the data being treated as published text, which is what is arguably allowing us to collect it in this manner in the first place, can it be republished without attribution to that original author? Can it be anonymised? Can it be altered? Do we need permission from them to publish it yet alone to change it, edit it? Actually, how would some of the social media platforms themselves consider the intellectual property of that content? Twitter might have one particular set of terms and conditions, but other platforms might have different ideas about who owns that data. Just to go back a stage, we're talking about Pulsar and other companies that sell the data. The concept of ownership there complicates things because it's not the respondents, it's not the people we're researching who we're paying to access the data. It's Twitter and it's the social media platforms themselves that might get access to that, get that money if we're collecting more than that, random 1%. Again, this complicates things and bringing in a transactional element to it makes it even more complex. I'll also quickly talk about the blurring of boundaries. What may be particularly novel for many researchers is that the space you're researching within or of may be one in which you yourself operate. Your own tweets, your own comments might be picked up if you are part of a network that you are researching. If I take a random set of tweets as every chance, I'm not that prolific, but every chance that I'll be part of that dataset, what does that mean for objectivity, for data quality and how I interpret the data? That's probably relatively unlikely, but also it might be someone that you know who is picked up in the dataset. Again, those have the same issues in terms of objectivity and data quality and actually it might be okay on something like Facebook or Twitter because if you know them, you may well be following them and interacting with them already. But what if you're doing some research on a platform like Grindr or Tinder? That's completely different context but completely plausible and people's personal sensitive information might be picked up from you and that's quite ethically challenging and there's very little you can do to stop that happening. Also, if you're interacting online yourself, that means that you yourself are searchable. So thinking about how you present yourself online, if you are in a forum, if you are operating in a Facebook group, do you create an alternative research persona or do you use your own personal account? And what does that mean about the power relationships between you and the people you're researching? How they view you? And if you've got your actual name there, then it doesn't matter because they can go and Google you anyway and find out a lot more information about you. And that completely again changes the dynamics and the relationships of how you're researching. So there's a real blurring of the professional and the personal identities that are taking part in this research. This talks more I think to the sort of quality elements of things but I think that is still an important part of ethical discussions. So does everybody have a fair and equal chance to have their voice heard? So we're excluding people without web access and without a social media account but also we're more likely to pick up people who are really, really vocal. So when you take that random 1%, it's not a random 1% of people, it's a random 1% of tweets. So somebody who tweets 100 times is 100 more times more likely than somebody to be picked up than someone who only tweets once. And what does that mean about bias in our sample but also what does that mean ethically? Are we overrepresenting the views of some types of people relative to the views of other people? There's also issues around verification so we've talked a little bit about bots. Do we know that it's a person? Do we know it's an organisation? Do we know that it's a bot tweeting and how can we differentiate those and how we should analyse those differently? How do we know that the people researching are part of our target population? That they are in the UK, that they are in Iran, Iraq, wherever, that these are the people we want to be researching. And also there's something here about online and offline identity. Does an online identity or an avatar count as a human subject where there's an ethical onus? So when somebody is in sort of a massive multiplayer online role-playing game am I researching the character or am I researching the individual behind the character? And what does that mean for ethics and typical ethical sort of ideas of the person? And who do you get consent off of? Are you getting consent from the character or the person behind that? You're also going to be picking up information about people's broadened networks. So if I pick in a load of Twitter data I don't just get the information about that individual but I get information about who retweets them or who they're retweeting or who they're replying to. And that spreads out the information beyond the actual research subjects that you're looking at. So one of the programs I'm working on is about linking survey and social media data. And I've got consent from the people in the surveys to link those two together but I haven't got consent from anyone else who they're connected to. I'm still going to pick up information about those. So it's only partially addressing these problems. And different types of data will have different issues. Numbers of retweets, number of followers, very basic metrics, probably okay. But more detailed text data is going to bring up certain challenges. But then images, videos are actually even more challenging. Firstly because it's very difficult to automatically identify and say these are problematic, these are not problematic. If you have things like images of children and family members, what are the ethics of that? I'll also mention derived variables, raw data, public information, fine. But as soon as you manipulate that at all, as soon as you're trying to extract something that summarises what's said in that content, that changes it. And it's no longer public data because you've assumed some knowledge about a person. You might have, so for example Facebook, there's lots of research using Facebook and it takes the pages you like, the people you follow, the type of text you say and it will make some re-judgements about you, it will make some re-judgements about you whether you voted Republican or Democrat. But it will also make some re-judgements about your sexuality, about all kinds of potentially sensitive elements of your life that those people didn't make available and didn't say that it was okay for you to assume that about me. And if those algorithms are deriving those characteristics are any good, which is not necessarily there, either they're rubbish and shouldn't be using them or they're good and actually that's sensitive information that you've just extracted and made available about out on individual. So there's some other things as well, responsibility for aborting abuse. So if you do find sensitive content online, what ethically is the role of the researcher to then report that or intervene in that situation? How do these interact with traditional methods, legal issues, terms of service that I've touched on a little bit. So that's a really quick whistle stop tour in about 20, 30 minutes of some of the issues. There's a load of resources here that are really, really good. Industry guidelines are all right. The University of Aberdeen framework I think was set up, was done about six months a year ago and is a good flow chart and idea of what to look at. The Lancaster University Ethics Forum was set up like a couple of weeks ago and it's meant to be a forum for discussing ethical issues around social media data. Very, very much recommend looking at that. That sense report on social media views is really good. The Wisdom of the Crowd report as well has some interesting discussion around ethics. There's this book coming up which will be published relatively soon which actively engages with discussions around ethics and online research and the handbook of social media research as well as a reasonable amount of information. I can send links to loads of these if you want to get them off me at the end of it. Thank you.