 Hello everyone, thank you for joining us today. This is the Computational Social Science Introductory Workshop where I'll be teaching you how to become a computational social scientist. My name is Dr. Julia Kazmaier. I work with the UK Dava Service and the Katherine Marsh Institute at the University of Manchester. Diving straight into the topic, this is what we're here for. What is up with computational social science? How do I become a computational social scientist? What are the eight steps of computational social science with discussion and project development? And then right at the end, there'll be time for final thoughts. We'll go through all the questions that people have submitted during the event and we'll have a bit of chat about, maybe you want some more explanation on something or you're unclear about how it applies to your work. We can discuss things like that. So computational social science is the use of computational and empirical methods to address social science questions. Now that seems really obvious and basic when I read it out like that, but it's important to get a bit more context around that. Because this is not trivial, computational social science requires really human thinking style kind of processes to identify important research questions and the computer thinking style thought processes to turn those research questions into computational or empirical methods. And then you've got to translate it back to human thinking to communicate the results to wider audiences. You want to communicate effectively over different timeframes to different people. It's complicated and you need a lot of different kinds of thinking skills, but don't worry, we'll get into it. Importantly, computational social science is not just using computers within a research project that focuses on social science. It's not just using digital versions of purely traditional social science methods or using digital but purely non-empirical methods. So computational social science is not communicating by email about a social science research project or it's not using online surveys as opposed to paper surveys or it's not keeping a digital diary instead of a paper diary as you're trying to do like some kind of experiential process kind of method. Don't worry, we'll get into it. Let's have some examples of what our computational social science project. So one example that I thought of would be to collect, process and analyze millions of online news articles to show changing political attitudes. So for example, when did the word woke first appear in political editorials in a newspaper? Was it treated as a good thing or a bad thing? How did that change over time to disambiguate? That would be a computational social science project or using real-time weather and traffic data to show how travelers respond in their travel choices. So do you find people take the car more or less on rainy days or do they take the public transport instead of riding a bike? These kinds of things that the choices people make that's a very interesting human behavioral question and real-time weather and traffic data can help answer that question. How about combining data from novel wearables or apps to establish correlations between social media activity and heart rate? I mean, I would be interested to know if certain apps make people more stressed or more excited, you know, their heart rate goes up or if they make people feel calmer or more relaxed and their heart rate goes down. These are some interesting ones. Last example I have here is to import process and format centuries of parish records to map family names over time. So this is about who's moving in, how bigger these families, what space are they taking up in a community over time? Some of the key factors in computational social science in those examples, things that they had in common were the data volume, the complexity, the speed, the difficulty or the novelty. So it's not important exactly which data sources we have whether it's weather apps, whether it's parish records, whether it's news articles from online. The important thing that makes it a CSS project is that the volume, complexity, speed, difficulty are too much to deal with without people. No one would ask a researcher to read a million articles looking for the word woke. That's not how people work. That's not how research proceeds. So if it's too big or it's changing too fast like real-time weather and traffic data or if it's too difficult like the parish records because you have to scan the images and then translate the images to text and then process that text, these are all things that make it CSS. So the data must pertain to people, actions, behaviors, choices and statements. So again, this is about the choices people make about how they feel when they're using social media, about where they move and how they travel, at what time of day, about where their families reside over time, these are about people. And the exact research question is not important but it must be a social science question. So it has to be how do people respond? What do they think, what do they, why do they purchase this instead of that? These kinds of questions. Okay, so now it's time for some interaction. Well, okay hang on, first a quote. In essence, computational social science is an opportunity to do socially valuable research that would not be possible without computational methods and tools. So this quote really encapsulates how I think computational social science works. It gives us tools to ask new research questions because we have access to processes and data that we would not otherwise be able to use. So now, I gave it away a bit too early but now it's time for interaction. So in Mentimeter, and there's the code and the link again, you have to decide, is this a computational social science project or not? If I wanted to scan historic recipes and use AI algorithms to recognize the text and identify the ingredients and measures used over time. So we've got one answer so far saying it's not CSS enough, it's not social science, it's not enough social science. I would also like to point out to everyone that there are no right or wrong answers here. Ooh, we've got to definitely CSS. They're coming back, they're fighting hard. There is a case to be made. Let me take a step back. I deliberately left these questions a little bit ambiguous as to what the research question was because that allows it to be CSS or not depending on how you look at it. So for in this case, I can see why there's this disagreement. If you're trying to identify ingredients, I like that at least someone has decided to go with the classic science response of I need more information to decide. If the research project is identifying ingredients purely because you want to understand what was available in a space at a time, that's probably like a historical ecology question. But if we're talking about what people used, like what they thought was valuable or posh ingredients or what's a fancy meal to serve at Christmas as opposed to like an ordinary peasants meal to have for breakfast, then this could be a social science question. So I understand exactly why you get this disagreement. But don't worry, there's more to come. CSS or not, use gamified smart home displays to understand how people interact with energy saving technologies. So gamification in case you're interested is like where you get little points or badges or like competitive tables to be like, oh, I can save more energy than my neighbors or I've walked so many more steps than everyone else in my age bracket, something like that. This gamification, it's a motivation or tracking tool. Again, I wonder if this is the same person who has decided they need more information to decide. Okay, this one looks pretty clearly. CSS seems to have won the argument here. And I agree that this seems to be about how people interact. So that's clearly a people question. And it's definitely using tools that we otherwise couldn't do without computation. So it makes a lot of sense. Advertise for survey participation on social media and store the responses in a database. This one, yeah, I mean, got social media in there. Store the responses in a database. So it could be quite a lot of data. Yeah, I mean, this one, I can see why, again, I can see why this is contradictory. In theory, surveys and storing responses is something you would do without computation. So I would think that this probably is using digital versions of non-digital methods and it's not really CSS. But again, it does kind of depend on the volume if you're trying to get half a million people to respond. Then the pure volume makes it CSS because you otherwise could not do it manually. But yeah, this is interesting. All right, good. I'm glad to see that there's disagreement because that means you're really thinking critically about it and that my examples aren't super obvious. Okay, so you're going to read in real time whether an air pollution data create complex models of hyper-local air quality. What do you think? Ah, not enough social science. Good first instincts need more information to decide. Definitely CSS. Yeah, again, this is one where I think it depends on the real focus of the research question. Are you creating models of hyper-local air quality to show why people of certain income brackets are choosing to live in one area or another? That would be a very social science question. Or are you just modeling air quality in a purely like physical, biological, environmental sense? I think that the difference here depends on what you're going to do with it. Are you asking about people and their choices? Or are you just asking about where is the air quality? So again, I'm glad to see that there's disagreement. CSS or not, train neural nets on social media data to create a believable chatbot that counteracts online radicalization. So that is a lot of big words. And I can explain them if you need. Neural nets are a kind of machine learning, AI, arguably. Social media, you're probably aware of. Chatbots are little sort of AI characters that respond to input. And online radicalization, I'm sure, is how you're all aware of, how to show that people change their behavior in response to online communications. Yeah. Okay, so this one, I think rightly, people are mostly leaning towards definitely CSS. There's one that's not enough social science. I think you could argue that it's not a research question, that it's more of like a activist stance or some kind of policy goal rather than a research question. And I can see why that might influence you to say there's not enough social science. I think if the research question is, can you counteract online radicalization by a chatbot? Then that's a pretty good research question. I would call that social science. But a lot of people are saying more information to decide hopefully that is because it contains a lot of words that you don't know. I might have gone over the top on that one. Okay, so here's a quick opportunity for you to tell me what you have learned so far about CSS. Does it seem familiar? Is it confusing? What's a concept maybe that I've brought up that surprised you? Maybe there's an aspect of it that I'm emphasizing that you thought, oh, I thought that was really obvious or oh, I was completely unaware of that. So this is short answer. It will appear as a word cloud if I have programmed it correctly. Yes, I believe so. So you should be able to type in, for example, chatbot if you were surprised that a chatbot could be CSS. Here we go. Computer and social science. None of this is surprising given the topic of the conversation. Combinations of social science. Yeah, okay. So it really is a combination. Broad is a good one. It is absolutely broad. It is kind of a methodological approach rather than topic. Just like social science, you can focus on loads of topics. Exciting is a good one. Yay. Powerful complexity, human thinking, big data. Ooh, we got some great words here. Extensive, modeling, interdisciplinary. Absolutely interdisciplinary. And we'll come on to that a little bit more about how you don't have to be a specialist in all of the aspects to be a participant in computational social science projects. Great new insight, storing data. Yeah, absolutely. Synergistic. Ooh, you got some great words. You guys are amazing. Problematic. You're right, it can be problematic because there's a lot of barriers. Not everyone will understand the processes or the methods or they will be really resistant to applying new methods to a sort of classic problem or they'll be unable to understand what you did and therefore they're really resistant to the conclusions you draw. So there are challenges, it is problematic. Absolutely, volume of data, neural net, it's about people. It absolutely is. It really is about giving you new ways to do the kind of social science work that would not have been possible without people. Okay, this is great. I'm really pleased with some of these responses. Thank you very much, everyone. Social scientists think like people. So they study people and their interactions and the behaviors and their thinking skills include things like abstraction and inference and fuzzy categories and background knowledge. So social scientists are really well suited to sort of thinking about weird things like people can belong to different groups at the same time and that's not a problem for social scientists. They understand you can be a part of a family and also a worker and also a union member or something like that. And they understand that people within a certain society are going to have certain background assumptions that are reasonable that football fans are associated with certain kinds of behavior or genders or classes and things like that. So they would be able to apply that background knowledge quite naturally. Data skills that social scientists have tend to be around response categorization and coding, quality evaluation and pattern detection. And also sort of basic statistical analysis. So there are lots of data skills that social scientists already have and they can leverage those data skills and expand on them or apply them in new ways. So please do not feel like if you're a social scientist, you don't have any relevant skills, you absolutely do. And social scientists use computers but are not at least currently often trained formally on how to use computers to write computer code. That may be changing. Lots of people coming through education now have probably gone through more Python or R courses than people who did five years ago and who knows if that's going to extend more in the future or if groups will be subdivided. In contrast, computer scientists really think like computers. So they solve informations and processing problems. These are not problems about people, about understanding how people do the things they do. They're things like, how can I make this outcome happen with less computer power or happen faster or something like that? So how can I sort these things in the boxes faster by writing more efficient code? This is the kind of research question that computer scientists might expect. Their thinking skills are more aligned with how computers work. So they're more concrete definitions and absolutes. They're more used to strict hierarchies and categories and clearly to find and scope variables and rules. So while it's normal for people to think about an individual as belonging to more than one group, that's not necessarily very straightforward to get a computer to recognize that kind of fuzzy category thing. Data skills, they use to collecting and analyzing and manipulating data through programming scripts, computational methods, technological tools. So that part of it, computer scientists tend to get much more training than social scientists. But they're not usually taught to identify and motivate research projects with societal impact or value. So they're not taught, for example, how to prioritize questions about solving human problems. They're sort of given questions about improving machine vision or improving efficient and processing, things like that. To do CSS involves combining human thinking, computer thinking, open-mindedness and mixed problems. So we kind of haven't talked about open-mindedness and mixed problems, but these are things that apply both to social scientists and computer scientists. So everyone, regardless of which sort of background you come from, whether you were trained as a computer scientist and you're now trying to learn CSS or you're trained as a social scientist and you're now trying to learn CSS, everyone needs to be open-minded and able to identify a mixed problem. So the human thinking skills that we talked about, so they're identifying important problems, knowledge gaps, considering possible solutions, connecting problems to relevant theories and perspectives and collecting relevant information and research to frame the approach. Human thinking is typically easy for social scientists because they're trained in abstraction and communication and context and things like that. They're harder for computer or pure data scientists who are not trained in dealing with ill-defined overlapping context-dependent concepts or using assumptions of background knowledge. Computers have no assumptions or background knowledge, so they're used to working without those assumptions and background knowledge. In contrast, computer thinking skills include things like accessing, organizing, processing fast or complex data, writing collaborative code and documenting workflows, all very valuable. And I think we all need to learn more. These are easy-ish for computer or data scientists because they're trained in computational methods and strict rules and exclusive definitions and formal instruction processes. They're also trained in programming, which is much easier to sort of document workflows if you're used to things like version control. They are, computer thinking is typically harder for social scientists because they haven't been, they've almost been taught not to think like a computer, but they can build on their skills and training. So if you're used to categorizing and coding survey responses, that's kind of computational thinking. If you're used to formatting surveys or drawing statistical analysis from complex data or recognizing patterns, these are also the kinds of skills you can leverage to get better at computational thinking. But they both need to be open-minded and eager to learn. So absolutely no one. And I mean, no one starts out with all the skills they need. No one knows all the skills they might need in the future that they'll need to acquire. Everyone needs to approach with an open mind, curiosity and a willingness to learn. Otherwise it's really difficult. I mean, don't get me started. Some skills will be easier to pick up or use in others, especially if you have people from different backgrounds. Some people will find, oh, I'll go teach myself that in the day, no worries, whereas other people in the group might try and go teach themselves that and it would take a week or two weeks. It's fine, not everyone will be as good at everything. But that's important because you can't do it all by yourself. You have to be prepared to collaborate. And this is where the interdisciplinary that we talked about earlier comes in. So we need to be able to understand the other approaches that people in our team are using. So if I'm a trained social scientist, I need to know enough about computational thinking to talk to a computer scientist so that we can work together to do something that I couldn't do alone. And I might not be able to teach myself to do in any reasonable amount of time. Likewise, the computer scientist needs to understand thinking like a human enough to understand why I'm asking these questions. Like why is that a research question that's worth answering? There's code to optimize. What do I care about how people make decisions about travel? So everyone needs to be open-minded, willing to come together. And you also need mixed problems. So these are problems that require human thinking and computer thinking. They're not purely human problems about like the psychology of individuals maybe or they're not purely computer problems about like optimizing algorithms. Importantly, mixed problems will become more important because resources are being digitized, interactions, objects, processes, everything's becoming smarter, it's becoming networked, it's becoming next-gen or whatever. Large volumes of data are made available and are updated faster than at any point in history. So we can now re-get really fast update, accurate data in ways that we didn't used to. And right now we might not have problems that we think that data is relevant to, but it could be very relevant if we start thinking what might we be able to do with it? What is this data showing us what could it be good for? And of course, the future, who knows what's gonna happen? Are we all gonna live in the metaverse? I hope not, but if we are, then we ought to at least think about how that's gonna affect people and their behaviors. There we go, and I'll move on. So the approach to computational social science comes down to an eight-step process that I have. Now, this is, pardon me, not like absolutely specific to computational social science, you could use these same eight steps for other kinds of research projects, but some of them are written in ways that really make sense for computational social science. So let's go through. So the first is identify the problem. And this is about being as clear and specific as possible about the pattern, the problem, the lack of insight, and also who is involved, where it is, whatever. So this is starting to frame your research question. Essentially, it's brainstorming with reflection and editing. And it's best if you do this with others because they will point out to you that they will help you reflect on why you think this problem is the problem, what you think the insight that's missing is. Maybe they'll identify somebody that you didn't consider and you're, oh, actually, yes, of course those people are quite involved in this problem. So you want to do this with others. You want, it's a brainstorming session about what the research question might be, but write it down. Please track all the changes you make to it. This is quite difficult if you're not used to it. If you're not used to writing down this kind of thing because you think, oh, it's just me noodling away on my own, oh, that's fine. Write it down. Make it part of your research journal. And then write down what other people say about it and what changes you make to it and why you made those changes. It will seem tedious and boring and really uncomfortable if you're not used to it. Do it anyway. The next is to explore the problem. So you've identified the problem. Now you're going to explore it. So you're going to gather information and perspectives about that problem in multiple ways. That might be surveys at either primary research, you design a survey, or it might be reading through secondary data that someone else has designed on the same topic. Might be observations, might be analyzing huge sets of secondary data that maybe don't relate exactly to the problem but close to it. Might be creating an app so that you can sort of get people to log data, share their walking data or something like that if you're trying to see how people actually move around. Might be web scraping to get a bunch of research papers. Might be using APIs, scraping tweets from the web or something like that. Might be expert interviews. There's loads of methods you could use here. Some are computationally intensive, some are not. Expert interviews tend not to be very computationally intensive. But this is to help you reflect on the problem you identified. You have to spell out the sub-problems and the processes and the relationships involved. You have to identify what simplifications you're going to make, what assumptions you have already made, maybe related issues. This is, again, why it helps to have other people involved because they will help you identify, you're simplifying that. That's not a problem if you are, but needs to be clear that you are. How you're simplifying it, why you're simplifying it. Other people will help you identify your assumptions because they'll be like, well, why do you say that? That doesn't sound right. You think like, oh, why doesn't it sound right to them? What am I thinking? How am I understanding this that isn't universally shared? Again, write it down. I cannot emphasize this enough. Write down where you got the data that you looked through, what you read that changed your mind about who was involved in the problem, what expert interviews you did, and why you chose those people, all of these things. It will seem really uncomfortable and tedious and boring, and you will regret it so much later if you don't do it. Just do it. All right, enough of me shouting about identifying problems and exploring problems. Now is another interaction. Okay, so it's the same mentee location and code as before. But now is an opportunity to tell me about your step one and step two. This could be for a project you did a long time ago, project that you're currently doing, project that you might imagine you could do in the future. Tell us maybe just something that's interesting about a problem you identified maybe and how you went exploring that problem and form you, solidifying your research question. If you have never done this before, it's okay. Imagine yourself doing it in the future and tell us maybe your step one and step two. So how did you identify the problem? How did you explore it? I'll give you some examples. I was interested in how greenhouse growers came to choose to purchase the equipment they did because these are really big purchases like a combined heat and power unit or an automated system for watering plants or expensive sort of full spectrum lighting systems. Those are big purchases and greenhouse growers in the Netherlands where I studied didn't have loads of spare cash. So they had to make really good business decisions. My research question was about how they came to make those business decisions. So I did a lot of interviews with greenhouse growers. Turns out they all just believe their neighbor. But arguably their neighbor who had already purchased the thing. So literature view, this is a classic one. Yes, absolutely. So step one, you kind of identify a problem you think like, what's going on? Why do people like, you know, choose to do this thing? Literature review, absolutely. Spotting the gap by a literature view. So this one sort of identifies that the, you identify the problem by a literature view, problematization by engaged scholarship and own career experience. Engaged scholarship, I don't know what that means but it sounds interesting. Feel free to just say in the chat or the questions if you wanna tell me more about engaged scholarship. Met with an academic advisor just built a better understanding of the topic at hand. Great, this is the kind of collaborative bouncing ideas around with people that I mentioned built upon previous ideas and information available. So great. Again, that's sort of scanning the landscape, finding, oh, nobody's asked this question before. I'll be going, I'll take that. Scribble a few dyes down, spoke to a retired professor in the area, recommended a couple of books, started to read them. Great, so this is, yeah, it can be a really slow process. It's not immediately obvious to everyone that, you know, exactly that a problem needs addressing. Sometimes you have to really talk to someone who's like, gosh, I wish somebody would address that problem. Oh, review all existing research on area of interest. Did you step one without realizing it would become research, you are not alone in this. Just chatting with my friend about racism and football and digital platforms and through this could be thought, this could be important research. Absolutely, you get to chatting with people, a problem that maybe you kind of knew subconsciously was out there, but then someone says it out loud in words and you're like, that is amazing, that is exactly the problem. Oops, that is exactly the problem I can't address. Great, scanning policy documents, speaking to policy makers. That's entered on a scroll, okay. Bested in changing labor vote shares between 2017-2019. Literature view, identified gaps, factors already linked to changing vote share. Yeah, so it might be interesting to see like all the previous research, if they had all the answers, we would absolutely understand how vote share changes. There's probably something missing and you might be just the person to approach it with new ideas, with new sort of like concepts. Yeah, it's great, it's great stuff here. Interested in the relationship between people's awareness of range of cultural objects and their taste. Okay, very cool. Explored data initially to see how to discuss it with a group. So again, discussing with a group, excellent. Exploring data initially. Yeah, this is an initial exploration of the topic. So you might then design something specifically to capture a new group of people and get them really in-depth exploration. Great stuff. Policy documents, speaking to policy makers. All right, super. All right, so after steps one and two, you're gonna formalize the concepts. So what you want to do is make the concepts and the processes explicit and you're gonna sort of approach something that is both human and computer readable. And this is often known as pseudo code. And it's sort of like English or any language really has sort of formal structure and punctuation and paragraph structure and like good writing. And this is very true of natural human language and computer code is full of like parentheses and indents and colons and things like that. Pseudo code is somewhere in between. So it might look like bullet points in complete sentences. It might look like the example here is if you're working on a project that involves the concept of trust and you want to use that in a computer model or even like the results of a survey, you're gonna have to define that. Is it a number between zero and 100? Or is it a number between zero and one or negative one and one? How are you defining that? How does it change over time? Do people increase trust in certain circumstances? Maybe following a mutually beneficial interaction or does it decrease after someone is found to be lying? Even if they're not lying to you, you have to start writing out these rules about how you think it works. So this is sort of like formalizing your hypotheses and also defining all the things that you're going to maybe measure or analyze or look at or model in relation to your hypotheses. Again, do this with other people because everyone's brain thinks differently. And if you're all checking each other's work, it helps to identify things where you're just sort of being a bit goofy and just treating something as really obvious and someone else is like, I don't know what you're saying there. It helps. Okay, and then you get into sort of the heart of the work which is collecting the data, implementing the software, verifying that your data and software or whatever it is that you're working on makes sense with your hypothesis. So you're going to select one or more methods that could be surveys, expert interviews again. It could be you design an app and ask people to interact with it. It could be that you create a simulation that models how people move in an emergency situation. Whatever it is that you're looking at because the choice of the method is really dependent on the topic. And you could have one or more methods. If your topic is complex, you might need to approach it from two or more directions. So you might need two or more methods and you'll try and use them all to complement each other or show where one method fails to pick up a dynamic that another method has found. This is just sort of really checking that your method has been implemented correctly. This is not the work. It's just saying, do our surveys, ask about the thing we think they're asking. Are our interviews structured in a way that we can believe the people are being honest about our stuff? If we're asking people really private information, don't have surveys that are interviews that are in like a really public space. Yeah, things like that. So this is sort of like sense checking. Did we do the thing right? Did we ask the right questions? Did we get the right information? Things like that. Okay, so I don't expect you to have as much to say about steps three and four because steps one and two are really universal to most research. Three and four might be more specific to computational social science, but I do wanna give you a chance to say anything about steps three and four. For example, maybe you could tell me if you've ever used pseudocode, if you've ever written out like a flowchart on a whiteboard about your process, how you think vote share has changed or something like that or how you think racism appears on digital platforms where maybe you did a diagram of what features you think matter. That's potentially these kinds of things or step four could be like, you know, a minimum testing of your survey questions to see what people ask. And then you change the survey questions and before you rolled it out really big. So tell us anything that you recognize about steps three or four in your own work or if you don't recognize steps three or four at all what you might have done differently. Okay, so mind maps, flowcharts, piloting surveys and interview questions. Absolutely, mind maps are a great way of doing step three, for example. It doesn't have to be a bullet point list. It doesn't have to look like code language. It could be a mind map or a flowchart or a diagram. It could be, I mean, if you're really out there it could be a Lego model of how you think people move through a space something like that. There's some really good creative ways to do the formalization or the sense checking. Are we asking the right thing? Pilot study for step three and poor pilots. See if operational definition of a variable is valid. Yeah, absolutely. Software ops and has, you know, yeah, beta testing, I suppose. But in the research context that would be a pilot study. So does the software do the thing? Does it do what you're expecting? Does the results you're starting to get make sense based on what you were expecting? Or maybe is something going wrong that causing people to misunderstand your questions or the interactions? Pencil and paper. So you're not far in that current project. That's fine. I mean, there's no right place to be at any given time, is there? You are where you are. But it's useful if you're not that far into the project think about how you might do these steps. So do you need flow charts? Do you need sort of bullet lists that everyone agrees this happens first and that leads to this other thing? It can be useful to explore other tools that you may not have used so far to see if they help you clarify your ideas. Choosing, reordering and manipulating sets of variables. Yeah, okay, okay. A mini project in a master's module, a very brief exploration content on Twitter at a specific moment, the 2020 year goes. Yeah, so that, yeah, you can think of it as mini projects or pilot studies, sort of little bits to say like, can I get an answer to this question the way I think I can? Yeah, absolutely. Rendering data as a network graph to see how certain entities or concepts interact. Great, so yeah, you could say, all right, we have access to this data. If I find certain patterns in it, that leads me to know those patterns are probably real and I might be able to get more data that explores those patterns specifically. Qualitative quantitative data to gain deeper insight through the problems. Yeah, step three, formalizing is really about clarifying your research question and making sure that you're gaining the right kind of insights that you understand the problem, the question. And step four is making sure that you can address that question. Formally, properly with the tools you're using. Great work everyone, I'm really pleased. So number five is experiment and analyze data. So you run the experiments, you do the surveys or the interviews or you build the models and the simulations, you analyze the data, you run the eye tracking studies, whatever it is that you're doing, whatever methods you chose in step four and tested to make sure that they're sensible. You do the thing full scale in step five. So you'll have to identify and explain the results within the context of the experiments or the model or the method. So what did you get? And does it make sense for what you did and what you're trying to do? So this is a little bit like step four but on the big scale, the big thing. And number six is the discussion and the conclusions. So you're going beyond the experiment model or method to draw some conclusions about what it means. So do your conclusions support a policy recommendation or identify whom these conclusions might matter most to? Should something change about how we're doing this or are these conclusions support the current sort of way that things are working? Who benefits from any proposed change and also who suffers from any proposed change? It's often not specified. We say things like, this tax policy will make these people so much better off. Okay, but who's worse off? And do we think those are the people who need to be worse off or not? It helps to be explicit. You might not be very popular but at least in your own paperwork you need to write down explicitly. Okay, so five, basically do the research thing full scale. Six, discuss and conclude. So again, I don't necessarily expect a whole lot of as much here as we had for steps one and two because not everybody will have done a step five or six or even be in a position to think about five or six. So we can, I'll leave this up for a couple of minutes see what people put in, but there might not be a lot here and that's fine. I can tell you some of the experiments that I ran if you're interested. So my step one, I thought, I noticed that machine learning algorithms are praised for being really good at finding insights even from very, very messy data. However, I noticed that some machine learning algorithms designed to deal with data that's very time ordered. They were not good at dealing with messy data. And I thought, hmm, what are the messy data we're giving machine learning algorithms normally is also time ordered, but we don't tell the machine learning algorithm about that, is that a problem? So steps three and four, I created different kinds of machine learning algorithms and gave it messy data, but didn't tell it was time ordered and saw how it did. And so yeah, it turns out it does matter. It matters quite a lot to some algorithms more than others. And so my conclusion was that more data is not always better, which was controversial, I guess. Okay, we've got Enfeeved, Organizing Synthesized Primary Qualitative Data, Reflexive Thematic Analysis to Analyze, No Machine Learning, that's fine. I mean, you're young yet. Longitudinal Studies to Make Conclusions About a Problem, Okay, good. Main conclusion from a mini pilot study was the more exhaustive research needed to be done surrounding different types of content shared on Twitter during football tournaments. Yeah, I mean, it's often the conclusion is that like this pattern that we thought, we found a pattern, we explored it, we do conclude the pattern really is there, we need more exploration. That is not an uncommon conclusion, is that we need to do more research. And that's the way you keep yourself in business. All right, probably not gonna get a whole lot more for five or six. So you might think, oh, discussion and conclusions, how can there be two more steps to this process? There are. Communicate, publish and present. So this is back into the really human thinking portion of computational social science or in fact, most research. All of the previous steps must be communicated to multiple audiences in multiple ways. So there's short-term and long-term engagement. There's public, academic, political students, all the different people that you want to read your research, you have to maybe tailor it differently to each audience. So you have to think about conferences, which will be either a poster or a spore presentation, journals, which will be much more sort of like formal and structured blogs, which are written, but not very formal. They can be interactive even. White papers, which are aimed at sort of like policy-making decisions, academic societies, workshops, university classes, all of these different things. You have to think carefully about what you're going to say to them and how you want them to take it. And this has to include short and long-term. Immediately after publishing, you're probably going to say different things than like 10 years after publishing. Think about it a little bit. And finally, share, document, and validate. Now this, this is why it helps to have written down the things. This really comes, this step is, you will be thanking yourself for having written everything down well earlier. So this is to make sure that the right thing was done. And that's by allowing your work to be studied, reproduced, modified as needed through openly available workflows, code and data as much as possible. So not everybody works with data that is safe to publish directly. So you might need to create synthetic versions or at least sample, synthetic sample versions so that people know what shape your data looked like. But your code can still be available, your workflows, so all the justifications about how you, why you identified this problem, who was involved, who noted that maybe, you know, this factor, this assumption should be stated clearly. You're really going to want to make sure as much of that as possible is out there. Again, it will feel uncomfortable. It will feel threatening because there are some grumpy people who will tell you everything you did was wrong. Tell them to jump in a lake, it's fine. But you have to get used to being seen and observed in a scientific setting because scientists like to observe. And if we don't make our research available and reproducible, people can't trust it. And if you're doing important work about like this policy needs to be enacted or, you know, education should work differently or health processes need to change, need to be able to justify that in a way that people can trust. Another important point here is that these steps are not linear. They're presented as one through eight, but they're really not. They're probably a big looping thing, you know? So you'll do one and then two and then back to one and then maybe a bit more two and then three. You might go to four and then back to two, three, four, three, five, four, three, five. You know, you'll kind of do these kind of things. The exception is that documentation, step eight needs to be done all the time, whole way through document. You cannot document too much unless you're procrastinating on doing the work because you're just spending time writing journals about like what shoes you're wearing today and how that made you feel about your research. So this is an important point to note. Document don't expect linearity. So tell us a bit about your seven and your eight. So how do you like to publish or present your work? Do you like post your presentations or presentations? Just give conferences all together and go straight to journals. Do most of the research you do end up teaching in the classroom? You know, do you write white papers and publish them to try and influence government policy? How about step eight? What documentation methods are you using version control? Get hub repositories? Is everything open and available? Tell me the things mostly because I'm nosy but also because I'm quite interested in reflecting on the things that are difficult for you or easy for you or things that you think you need to do better or, you know, like how we can help each other reflect on using these steps in the most effective way. So the open science framework, absolutely. There's also things like the fair principles, the fair stands for findable, accessible, interoperable, reproducible. So there absolutely are some open science initiatives that are really working to make research reproducible and trustworthy because I think there's some lack of trust that has come across in some bad decision making. Reports and the synthesis of work undertaken, not personally active research, but report on the research of others. Okay, so that's interesting. But if you're reporting on the research of others, you still need to make it clear how you found their research, maybe why you chose their research instead of some other research, why it was put in the basket of this is relevant. Yeah, absolutely. All very good. Documents change control by version and date. Now, if you're doing that on your laptop, if you're just saving things like full paper, February, full paper, March, full paper, there are better ways to do it. We can talk about that if you like. GitHub and version control software is a good option. Reshare, okay. GitHub again here, relevant studies and interpretation to study participants. That's an audience I hadn't explicitly called out, but yeah, the people in your study also need, especially if you're working in like a health and medicine or social care or something. If they've participated, they have a right to know what was going on there. That's really invaluable. I hadn't considered that good. Sharing your code or even just ideas can come at a cost if you're an ethnic minority, female or immigrant researcher working at a list. PISDEM, as soon as prestigious use university, people with more authority sometimes borrow your code ideas without any attribution. That is absolutely a good reason to document everything you're doing as well because you can bring up that you attended this meeting and shared this idea and then this person started using your idea without attribution. If you have documented everything really well, you can use it in sort of, complaints to your human resources, for example, or requests to get someone's work taken down or something like that. Document, you're absolutely right. Sharing your ideas can come at a cost. Documenting is the way to mitigate that cost. This one likes poster presentations. Did a couple of undergrads like the visual resources. Also writing a blog post right now which is actually relevant to current events. World Cup and guitar, so good timing. Absolutely, it's great. I mean, blog posts are wonderful things for making you write things down in a way that other people might see. You're much more likely to think critically about what you're writing, how you're phrasing it, how you're presenting the conclusions. I think a lot of people should do a lot with blog posts. Conferences are useful for novice researchers like me. Helps me condense my paper into concise presentation, gain insights from academics otherwise would not have access to. Absolutely, saying something out loud makes you think and understand it differently. So even if you don't, if your paper doesn't get into a conference, consider presenting it to like a lunch seminar with your research group or even just to line up some teddy bears on the sofa and give them your presentation. You will think and understand your work differently if you say it out loud. Also consider doing workshops with us because we'll livestream them on YouTube. Website, blogs contributing to acumen academic papers, GitHub, social media, posters, research groups, discussion. What's that group? Yeah, I mean, there's some great ways to publish now with social media and a lot of different ways to promote the ideas and sort of foment discussions about how it's going and who's benefiting. Wonderful work. I'm really appreciating this response. So thank you, everyone. So we're winding down now, at least with a formal part. We'll carry on with some discussion. So what are your CSS takeaways? What are you walking away from this with a question that you're gonna go look up or a method that you're interested in to learn more about or a furious complaint that you'd like to make? Document everything, absolutely. Yeah, web scraping, good one. We do have some research on that if you want links to our GitHub repo with some code to work through and some videos about how to work through it. Read more. Yeah, it really feels... There's a lot, we all wish we had more time to read and do the work and I'm considering doing another edition of my reproducibility workshop in which I argue that you're allowed to just tell people to jump in a lake if they're putting too much demands on you. Again, that works better in some contexts than others but make time for yourself. If you want to read more, schedule time in your diaries, tell people you're busy. Talk more with others. Yeah, it can be a bit uncomfortable. It can be quite like feel vulnerable to say your ideas out loud to other people but it's good for you. Sunshine is the best disinfectant and all that metaphorical stuff. Clustering methods. Yeah, that's some interesting machine learning methods that can really wade through huge amounts of data to find some interesting patterns. Machine learning came up here as well. Non-linear steps. Yeah, nothing is as linear as we have to write it. Writing is fundamentally a linear exercise so it makes the ideas seem very linear but they are all kinds of topsy-turvy. Listen carefully, take it inside. I mean, that's good for life but definitely, yeah, listen carefully to other people because they will tell you when what you've said doesn't make sense to them or when something you've said really resonates with them, inspires them, great to listen, flexible. Ah, voice ideas, yeah, speak out. But the worst that happens is that people think, oh, that person speaks a lot, it's fine. I mean, that's not a bad thing. Arguably, it can be seen as a much more negative thing in some populations than others but you're trying to be the best you. So if you think it's a good thing to be a person who speaks about your ideas then please be the best you, normal research. Yeah, it is really, you know, always have to be cutting edge, breaking the new plane of existence open with stuff. A lot of research is just sort of regular identifying problems, approaching it logically, be there in a way, getting it done, publishing the results. And computational social science is absolutely a method of doing that. It's just using tools that people have not historically used on certain kinds of questions that are very important questions. So yeah, great stuff. Let's move. I have some references here if you're interested. This is not the easiest way. The two-ring way is an online book about reproducibility, Programming with Python for Social Science. This is about sort of understanding computational thinking but it's really geared towards social scientists. So that's a great resource. Dynamics of computerization in a social science research team. So again, it's about sort of infrastructures and strategies and skills and how computational social science is gonna be big. Installing computational social science and challenges of new information and communication technologies. Yeah. And speaking sociologically with big data, symphonic social science and the future of big data research. So there's a lot out there. People are starting to know that this is gonna be a big thing and that we're social science students and departments are not often set up to make it easy or straightforward or comfortable for people to do computational social science but you might be the one that makes the difference in your institution. And here's some contact details for me. I'm on master.nem in case the bird app goes completely fail whale. I'm also still on the bird app, so it's fine. And UK Data Services, of course, on Twitter.