 Hi, my name is Sarah Mannheimer. I'm an associate professor and data librarian at Montana State University, and I'm happy to be here at the CNI fall 2020 virtual membership meeting. Today I'll be talking about connecting communities of practice to support big social data stewardship. So today's talk, I will go through the benefits of data sharing. We'll define qualitative data and big social data and then talk through some issues and data use and reuse. And how identifying those issues can help us connect communities of practice in order to support big social data stewardship. And then we'll talk through some next steps. So, this talk begins with the assumption that data sharing is beneficial to science and to society. As Mothner said in 2012, the case for sharing data rests on three central pillars, a scientific, a moral and an economic one. So for scientific, the benefits of data sharing include building new knowledge, new hypotheses, new methodologies, conducting comparative research or strengthening existing theories. Sharing data also promotes interdisciplinary use of data and it can provide data for teaching students in the classroom. The moral benefits of data include sharing data include reducing burden on research subjects and facilitating more research about rare, hard to reach or inaccessible respondents and that's a real benefit of building and sharing social media data sets as well, in particular. Then there's transparency and accountability in order to foster trust from the public. And then economically sharing data conserves time and resources and therefore supports a higher return on investment. So, let's quickly define qualitative data and big social data. My research investigates the similarities between these. So it's important to know what they are. So, first qualitative data tends to be on a smaller scale. And I've grouped it into two categories here. There's data solicited for research studies. So field notes, observational records, interviews, focus group transcripts, etc. Then there's data found or collected with minimal interference by researchers. So things like found diaries or correspondence, official documents or home videos and social interactions as well. And for big social data, this tends to be more on a larger scale. And I've grouped it into four different categories. There's self representation data. Stuff like your username and password, your profile picture, your biographical information. Then there's interaction data, social interaction data stuff like your timeline posts or online forum posts or the content that you share online comments you make or direct messages. And then there's relationships data. So things like follower or following data and number of likes. And then metadata. So timestamps, geospatial data for where you were when you were posting type of operating system, etc. It's notable that none of these big social data types are solicited by researchers, which is a major difference between quality qualitative data and big social data, the way that I'm structuring them here. It's possible to use social media with more qualitative methods. So by conducting online ethnographies or contacting interview subjects directly via social media. But that's not what I'm focusing on here since they really amount to more of a small scale data. I'm trying to solve problems related to the large scale of big social data. So the idea behind connecting these communities of practice is that qualitative data stewardship is more well established, whereas big social data stewardship is less well developed. And so my hypothesis or my hope is that by connecting these two communities will be able to have big social data stewardship practices that are more robust. And that benefit from this long standing tradition of data stewardship for qualitative data. I want to investigate data stewardship for each of these communities to support ethical and epistemologically sound day use and reuse for big social data. And potentially to support more scaling up of qualitative studies by combining archive data to investigate through more of a big data lens. And so the issues that I identified fall into these two categories of epistemological and ethical and legal issues. And I'll discuss them further in the rest of the presentation. So I did an in depth review to identify these six issues. The first one is context. So issues of context are similar for both qualitative data reuse and big social data. For both types of data, there's concern that it may not be able to be properly understood outside of its original context. And for qualitative data in particular concern center around whether the data can be meaningful at all without the knowledge and expertise of the researchers who conducted the original research project. And with qualitative research researchers are often embedded in communities and have this deep contextual knowledge for big social data. The posts are kind of removed from context by nature. They are short pieces of text. Taken out of a larger context of a public or a personal life. And this out of context effect is only compounded when you're gathering data at such a large scale. And for big social data, the researcher has opposed to qualitative data where the researcher is so deeply embedded. A big social data researcher may never speak to the person who wrote the post or know much about that person's identities or broader contexts. So let's look at some data stewardship practices that are used to support context for qualitative data and big social data. Let's see. For both types of data, metadata can support context to some extent, especially information about the communities and the research participants and as much information as you can provide about how data were collected, cleaned and analyzed. And the data stewards who have supported context, metadata for qualitative data should be able to connect those two big social data as well. And we'll talk about that. Next is data quality. So for qualitative data quality issues tend to stem from human error. Simple mistakes and inaccuracies that humans make throughout the process. Any research subjects can introduce error reporters or recorders of field data. The researchers themselves and then data coders can all introduce error. And then data quality issues for big social data have other complexities that can introduce different types of errors. The automated nature of data collection means there are fewer opportunities for simple human mistakes, but quality issues can result from the fact that data users aren't speaking directly to the researcher, but rather to a perceived online community. That can cause a dissonance between what the person means and what the researcher may interpret it to mean. Other quality issues can result from the specific environment of online social platforms, fake accounts and bots can introduce errors, bias and distortion. And big social data sampling is often biased because social media APIs may not return complete data and users of social media platforms may not be representative of society as a whole. So the sampling that qualitative data, qualitative researchers would do, it may not apply here. And then for both types of data, systematic errors can be introduced as a result of bias. And when researchers reuse qualitative data or combine data sets as we'll talk about next, these errors can compound. So for data quality, data stewardship strategies include supporting documentation of the research process when sharing data. So talking through potential errors, potential bias and any potentially missing data can go a long way to supporting data quality. The second issue I identified was data comparability, which is an especially important issue for both qualitative data reuse and big social data research. For both types of data, combining multiple data sets can support larger scale studies, which is a particular focus for qualitative data, but can apply to both. And combining data can also be used as a strategy to better understand context and enhance data quality, which is a particular focus for big social data, but can also apply to both. So data stewardship strategies to support data comparability are again documentation, address any missing data, outline your research questions and methods very in depth, and then support metadata standards. So for qualitative data, the data documentation initiative is one very established metadata standard. DDI has been used for big social data, but it could be extended to include more big social data specific information. And for big social data, data stewards and researchers can advocate for more interoperable metadata standards that can be adopted across the social media landscape, including maybe something like an existing model like schema.org. So we'll move on to ethical issues. Informed consent was a major issue I identified. It's similar. It's a similar issue with qualitative data and big social data. While researchers increasingly include language and consent agreements regarding data reuse, it's really impossible for research participants to understand the full possibilities for potential reuse of open data. And ethical questions then come up like regarding whether consent can ever be fully informed for either type of data. And social media terms of service could include user agreements that include the use of data for research purposes, but we know that users don't generally read the terms of service and if they do, the extent of future data reuse is impossible to determine. And that's really true for these broad consent statements that try to account for future use of data as well. But this is kind of an issue that will just come up. Let's talk through some data stewardship strategies. So, for big social data, data stewards can help draft broad consent language to support data reuse. So that's when the consent language says this data may be reused for any number of purposes and I'll be publishing it here. For big social data, data stewards can encourage strategies and these were outlined in the most recent common rule in an appendix. Some strategies like focus groups or community advisory groups can support sort of like inferred consent. So if you have people from the community who can give you more information about how this community would feel about the research that you're doing. And then there are also some sort of exploratory automated systems that researchers can use to actually get a social media media user to consent to the research project into each individual user. So these are some emerging ideas and connecting these with data stewardship strategies from qualitative reuse can support better informed consent. Next, privacy and confidentiality. These are major issues for both qualitative data and big social data, but some specific elements of these concerns are distinct between the two for qualitative data reuse. Researchers that I read were concerned that anonymization could compromise the integrity of the data or remove important contextual information. And then also, even if you're carefully de-identifying, you may not be guaranteed to prevent deductive disclosure based on the contextual information that you do provide in order to support data reuse. So for big social data, anonymization is difficult and even potentially impossible because social media platforms are often full text searchable, which means that any exact quote you use, which is a standard practice in qualitative researchers to use exact quotes to sort of highlight ideas. Those quotes could disclose a user's identity. And then the large scale of big social data also just makes it easier to deduce identities, which puts participants at risk. Another unique consideration for big social data is the idea of public versus private. So social media posts may technically be publicly available online, but users may still view their social media profile as somewhat private, you know, intended only to speak specifically to their online community, which they have curated. So it could be considered a breach of privacy to read, collect and use their posts for a different purpose than they were intended for research purposes. So let's look at some data stewardship strategies to support privacy and confidentiality. We can support the idea that big social data is human subjects data and connect it to qualitative data stewardship in that way. So supporting de-identify, de-identification procedures, delete names or replace them with pseudonyms, remove any details about participants' lives, and then you could also aggregate data to support de-identification. And then if data can't be safely de-identified or shared, repositories can support restricted access. And this can be true for both types of data. You can embargo the data for a period of time, or you can fully restrict it. And then data use agreements dictate the conditions required for other researchers to access and use the data. So the last issue we'll talk about today is intellectual property. Qualitative data are the intellectual property of research participants. And so they need to either waive their rights or license their responses for use in the research studies. For big social data, the same is true, but intellectual property is also made more complex by the fact that big social data are often controlled by private for-profit companies and who can then even further control how the social media posts are used. So for example, in the case of Twitter, the terms of service say that only tweet ID numbers may be openly shared and archived. So data stewards have come up with some ideas for how to solve that problem. For instance, documenting the now has created this hydrator tool they call it, which uses the Twitter API to take a tweet ID and pull complete metadata from that tweet ID. So for data stewardship strategies, data licensing is important for both qualitative and big social data if the data is newly collected. And then rights management for existing data. So that includes navigating the social media terms of service they discussed before. And then the data stewards can also encourage inclusion of any tools that are required to support intellectual property protection like the Twitter hydrator as part of the data deposit. So by investigating issues in qualitative data reuse and big social research side by side, data stewardship practices can be developed to support sounder practices for both qualitative data and big social data. And in the next steps of this research, I'm going to conduct interviews with qualitative researchers, big social data researchers and data stewards in order to identify strategies for supporting data stewardship in both of these communities and bolstering the data stewardship practices in both. So thank you because this is a recorded video there won't be a Q&A session I don't think and so please reach out to me via email or Twitter and ask me questions make comments I would love to hear from you. So thanks very much.