 Today we have an online workshop that means there is some interactive elements in the workshop today and the topic is why and how to document your workflow. We do have me just check on this chat real quick to make sure that people can we see. Yes. That is a point I will, I will address me just move through this. So, today our table of contents includes concepts as such as what is the workflow and why do workflows need to be documented and my particular contribution the Great British Science of what and how to document and when to stop documenting. So we'll be going through all of these moving forward. What is a workflow broadly a workflow is an ordered series of actions that produce an outcome. Now you can do that workflow and not documented at all, and what order those actions happened in and exactly what actions will be lost to time. Most of our workflows are lost to time, you know we don't actually write down, maybe the steps we took in a day did we drop something in the post first or did we, you know, go to the shop and pick up a pint of milk or did we do it the other way it doesn't really matter because we don't need to reproduce that specific outcome. Most of the time. But in the scientific or research context, a workflow is all the details and ordered steps that allow another researcher to produce your same outcome. So it's really unlikely that someone else will need to produce your random Wednesday so they don't care whether you dropped something in the post or picked up the milk first. But if you're trying to get someone else to reproduce your same scientific, you know, results, it does matter that they be able to reproduce your workflow. So just, you know, broadly a couple of concepts there. This will probably sound quite intuitive and you think of course why are you telling me this because allowing other researchers to produce your same outcome is known as reproducibility or sometimes known as replicability and it is a key part of the scientific method. Despite this, there has been a huge problem in social sciences and medicine, particularly, and has become known as the crisis of reproducibility. Because huge sways of people for decades now have not been documenting their workflows well enough. And no one can can find that they get the same answers when they do a study or they do, you know, experiment or an intervention or they do some kind of observational series they're getting completely different outcomes and everyone's saying like well if you got a and I got B. What does that mean we should do. Nobody knows it's all very panicked. Everyone's having a flap. So, here we are today trying to fix the world. Yeah, so Pusheen is sad about this crisis of reproducibility. So there you go it is proof that I like cats because I put a cartoon one in my presentation. Now specifically, why should scientists document their workflows. Now, you know, it allows other scientists to reproduce validate or extend the work. That's that's quite important for the scientific method. It also helps others to understand the work and its results they may have absolutely no interest in reproducing your work. It helps them understand it if they can really get the steps that you took to get your results. And it contributes to trust in the work and its results. If everyone understands and feels like they could reproduce your results if they've, if they wanted to. They're more likely to trust you to trust your outcomes to trust like the conclusions that you produce and it turns out trusting people is important to working together. I don't know society. And particularly from my perspective, it promotes a culture of open science in which people are honest with each other about how they did the thing what results they got. You know why it matters why they made their choices. It means we're not all sneaking around trying to get one over on the other guy and you know being really duplicitous and you know underhanded. I mean, you can do that if you want but it's not great for science as a whole. I think, but why should I document my workflows? Obviously, I trust myself. Well work well documented workflows makes for a really easy method section. Most papers or chapters or dissertations or theses or anything that you go to write as as a researcher require a method section. And if you wait right up until just before you're submitting the thing to write your method section you're going to get panicked and flappy and not know what to put in there and actually what did I do first and where was that set of data from the one website or from the other. Oh God. So if you want an easy way to write your method section document your workflow. Also, documenting your workflow helps you remember and understand your own work better which is very important if someone corners you at a conference and ask you a specific question. It is deeply undermining of your own sort of confidence in your research if someone asks you something and you think oh crap I have no idea what I did actually. I didn't write it down. I haven't spent any time thinking to myself about what I did. I'm going to create a distraction and run away. It's not great for your career or networking or even just your own confidence. This is for my particular particular sort of focus. If you have documented your workflows very very well it is a fast and easy way to identify and correct errors. Because you can look back and say oh wait I did things differently in these two parts. I'll just fix that when we run it. You can incorporate new or different data. So let's say you have very good process for how you worked with the data from that website. If someone has given you a new data set you think oh I'll just do the exact same thing in the exact same order to the new data. Great you know really a good for parody. You rerun tests or recreate images if you think actually I need to like recreate images for example maybe you want to create a colorblind friendly image because your first image was really hard to see. If you have all the steps that you took to get your original image it's very easy to do all those steps again get a new image. And of course you can apply your work to a new topic area or you know a new set of data a new concept a new you know collaborative effort. It's much easier to do if you have it all written down you can share it with your partners you can you know see how it slots in with the new topic. Generally good idea to know what you've done if you want ever want to do it again. And there's probably a lot more possibilities for why you know every individual person should document their own work flows. But realistically a workflow comes from keeping very good records of what you did all the data that you used including the source and the version and when you got it. All the processes you employed the decisions you made the analysis you ran the tools you use the visualizations you chose and much more than that might seem like a phenomenal amount of headache you think like oh god can't people just do whatever and like believe me when I say a thing well ideally we would like that but it's not the way the world works right. So I'm going to talk to you next about how the Great British Bake Off illustrates reproducibility quite well, specifically the technical challenge round. I don't know if you are fans of the Great British Bake Off I'm going to assume you are because lots of people are. Even if you're not actually a fan you don't watch the show you're probably at least aware of it because it's kind of big cultural phenomenon and the technical challenge is. Where contestants have to produce a perfect example you're like the same thing of an ideal classic recipe. Often a recipe that they have never made before maybe never even heard of before it's it's quite you know it really throws a spanner in the works of people who are confident in their baking skills to suddenly be asked to produce a perfect thing that they've never even heard of. And importantly they have to produce this perfect thing with vague instructions. They do have identical ingredients but maybe not everything that they're given will be needed. They do have equivalent equipment but again maybe they're given choices you know do I use the round cookie cutter or the you know person shaped cookie cutters things like that they don't know the right way. And of course there's a fixed time limit. Now for the Great British Bake Off. This produces quite a lot of comedy as people produce, you know things that are sliding sideways off a plate when they should be standing up in a particular shape, or the wrong color or the wrong texture or the wrong. Who knows what else. And the reason it's funny is because they're being asked to do a thing they're being asked to reproduce something that they don't know how to do with a really bad set of instructions. Now of course the bake off would be much less entertaining if the technical round had perfect instructions took out all the guesswork. But, you know, in contrast, science in an ideal world would seek to remove all that guesswork through well documented workflows. So this is, you know, just briefly sort of comedy interpretation of why a well documented workflow might be important. Certainly bake off contestants would probably prefer a better, a less vague recipe. So you can kind of think of your documented workflows as making the best recipe that you can so that other people can reproduce your scientific cake. I don't know the metaphor is a bit strange there. Let's start with the interaction. So now we're going to move into Mentimeter for the next bits. And that means I will ask you to enter, you know, answers to questions or suggestions or sort of short answers into your second device if you have a second device or this device, you know the same one that you're using zoom on you can switch between screens. So let's get interactive. What are some of the things you do in your work that need to be documented. You know, so I believe these should be able to answer short answers for this up to three. So just tell me a couple of things like do you interview participants, or do you create online surveys, or do you maybe use sensor data. You know, maybe you go and do some interesting observations. Yeah, classic data analysis, all of us do data analysis. And while that might seem, you know, everyone knows in theory how to do data analysis the specific analysis you did, you know, and then there's a question on the what is the Mentimeter login again. So if you want to pop the details back in its M E N T I dot com. And then there's an eight digit code that I think social will be able to share again. Yeah, good one literature search, because you know what what are the keywords that you used when you did your litter, you know, how did you get this set of literature and decide it was worth, you know, searching. You know, the subtle one people don't think about literature search is something that you need to document, but you do data exclusions that's good one yeah why why do you keep this person but exclude that one from from your data set transcribing interviews That's one that's very clear, you know, do you use software to do it. Did you pay a particular service, you know, how did you record the interviews you know what what kind of files are they in all of this stuff is potentially quite important. Okay, shadowing I'm not sure. Oh you mean like I'm following someone around while they do think so that's an observational process that's. That's really hard for someone else to replicate because they'll never be able to go back in time and shadow the same person on the same day, but as clear as you can about why you selected the person to shadow, but the context of shadowing was, you have to write all this down so that they can at least try and approximate something. Okay, a lot of good, good stuff here. We're clearly from the set of answers that are coming through social scientists for the most part I think no one here is talking about like, you know, digitally simulating the movement of ions or something. But this is all the more important because it is social science that had some of the biggest crises of reproducibility, you know people were just absolutely producing sort of an analysis or maybe survey data. And nobody else was able to get anything remotely close to the same analysis to come out of service that they thought were doing the same thing. So people got really confused and like, are you making your data up. Are you excluding people and not telling me why you know all kinds of things focus groups that's a good one as well how do you select what's the context of in which they're, you know the meeting and you know what kind of prompts are they given you have to have to be very clear about all of that. Very good stuff here. And it's clear that we don't all have the same documentation needs documenting a literature search is going to be very different than documenting how you shadow someone. They both need to be documented, but exactly what you need to document and how detailed you need to be. That's, that's going to vary. So don't worry about it too much. If it's not immediately obvious. How you do all the different things. Okay, Hannah is still not seeing the code. I believe it's 3009 was the first. Let me see if I can back up. It is 30098964. Hopefully, there we go. Yes. I got some got some good answers there. Thank you. So let's move back in. So these clearly there's a lot of we know there's these things that we should be documenting. It's not always obvious how to document them the best and that's why you're here. So let's talk for a moment how you would document these things, you know, your first instincts, and we can talk about whether later on we'll talk a little bit, whether those first instincts scale up or apply to different contexts, or whether there's maybe a particular software tool that will help you do that better and things like this. So please do not worry. No one can tell who answers what in this context so you can answer absolutely anything you want. You can say, you know, I document my literature search by taking screenshots of my Google search. You know, okay. That's that's one way maybe you document your surveys by just sort of saving a link to the online survey in, you know, some kind of online survey data set. Maybe you document how you exclude people from make a table and word classic classic and you are not alone in doing that because a lot of us are told that tables are a good way to present information in a ready form and that's true. And in word because lots of people like to use word. We've got some Excel spreadsheets methods in a Word document, Stata do files. Okay, that's a good one. That's a good one. Written notes, statistical code PDF of surveys, keeping a diary. Okay, so we've got some good Friday coming in this is great. Audio recordings, super bullet points, transcriptions. Okay, yeah, we've got some great options coming in. And you'll see that these are more or less appropriate to different kinds of things. So, you know, it makes a lot of sense to keep the audio files if you record it a, you know, semi structured interview with someone. Obviously, you might also want to keep the transcription that you produce from that audio recording, you can keep them both. Those are good. The bullet points are kind of that one's a good one. I like that because it to me that reminds me of bullet journals. And you can sort of on any given day that you're doing research, list out the things you think you're going to do that day. And then at the end of the day you review them and say, did I do this thing. So, if not, why not, if I did the thing, how did it go where did I, you know, store the files something like that so bullet points was certainly like a bullet journal, or keeping a diary that's also written there. These are really good ways for you as a researcher for your own well being to document your work, your word flow. On the other hand, the Stata do files and you know there's another one about sort of coding, you know, things like that, you know, saving your codes, statistical codes. This is great because these are things you can really share as they are that someone else can use your same code on your same data and inarguably they will get the same results and that is a very useful very clear, very trustworthy way of making sure that someone else can look in under the hood, you know, of your, your research and saying like oh what's going on what did they do. Great. I love that. Right methods as you go along. That is a great one because absolutely what you think you're doing. And what you actually do. And what you decide three weeks later you're going to do instead. You kind of want to keep track of all this because the reason you might make a change in three weeks. You want to document why you made that change why you changed your mind what decisions you're making and whether you've changed those decisions later. This is great. So we also see a lot of people are, you know, tables and spreadsheets and you know, sort of that you're kind of thinking of not only how you're going to record these things but how you're going to let other people see what you've recorded. And you might be blending the two points but they're not totally separable. So another one we've got on here is cloud. And I assume that means like cloud sharing your, your research points you know you can put your data sets and your statistical code and your, you know, maybe research meeting notes in a dropbox folder and share it with someone who wants to joins your project later on. So we're kind of mixing both how you document things and how you make your documentation available they're not totally separable it is important to keep that in mind. So, but they're not, they're not totally the same as well so you might be really confident that you're doing a great job of documenting things because you're putting all your notes in a dropbox folder. And how are you actually writing those notes as well how are you making it clear how are you dating the files things like that so there's lots more to cover and we'll carry on. Just want to take a moment to move through the questions. So let's see clear out some of these that have been answered. Great. Any tips for keeping up momentum on documenting workflow. I always start out with great intentions but suffer for attenuation of effort over time. You and me both. I mean, making your documentation available to other people is a good way to motivate you to not have terrible documentation, because you know it's a little bit like you maybe get more work done if you're working in a cafe. Instead of in your own home because in a cafe, other people are looking at you and you thinking like oh I've got to look properly serious here as I'm typing up things. You're not, you know, randomly having a coffee while folding your laundry and you know, catching up on your Netflix, you know, queue, like the idea that other people are looking can be very good motivation for kick doing something well. We will cover some other sort of topics about advice for each other but I think that's a great one. Let's move on to the next slide again is this is part of the interactive elements so so please do keep your comments coming. Okay, this one's a little bit more about decisions and I've covered this a little bit but what are some of the decisions you make that you need to document. And this might be something that you will have thought about because you've been reading someone else's research and you're trying to replicate what they're doing, and you realize you've come to a decision and you have no idea what they did. And you don't know how to do how to move forward. So anytime you have some of those. I mean, excluding participants that's a decision. Eliminating outliers from your statistical analysis that is a decision and you need to document how many outliers you removed and why you know what cut off point did you use. What effect did it have on your analysis, some things like that so go ahead and share some of the decisions that you are aware of in your research that you might need to document or that you wish other people would document. Those considerations, that's great. That's a really good contribution. Yeah, whether or not you ask these people or those people how you phrase the questions, how you, you know, what what setting, maybe you meet in so that the participants feel safe or, you know, not overheard or like that. These are these are some very good decisions that you have to make around different kinds of things. Also, of course, if you have to do an ethical approval for your research. That's a good way to document your ethical considerations and one that is quite formal. Unfortunately, I think actually getting your ethical, you know, decisions documented formally in an approval is a bit of a back and forth process. So you might want to think about, you know, how you document the changes. But yeah, good choice, which database accessed. Okay, data to include or exclude choice of analysis method that is brilliant software to use data cleaning Excel criteria. Yeah, these are all good. These are all really good and they show that you are aware that these decisions are out there, and that if you were to try and replicate someone else's answer, you know, someone else's research, you would come to these questions and think how do I do this because it's not clear how they did what they did. Yeah, I mean some of these decisions, there isn't a right or wrong answer necessarily like geographical reasons that's just been included. If your research is about, you know, comparing London and Tokyo on some, you may be health metric or something like that. Then it's clear why you're comparing London and Tokyo, but you have to say, actually, how do I define London? Is it these boroughs in central London? Is it everywhere that is within five miles of a tube stomp? You know, is it, you know, am I using someone else's definition of what counts as London and not London? I'm just adopting theirs. However, you make that choice and there's no real right or wrong way to make that choice. You need to document why you made it the way you did. And if you're borrowing someone else's decision, you know, if you're using the, I don't know, ONS definition of what counts as London, then you link to that. Good. Ethnicity, that's a good one. That's a very hairy one because, you know, maybe everyone has different ideas of what counts as ethnicity. Is it self reported identity? And then is it really free text or are you trying to put people in boxes? That's a tricky one. Yeah. Very good to document properly. Right. Okay. Data cleaning. That's a very processed decision. You know, am I going to recode these variables into yes or no, or am I going to recode this free text field into, you know, maybe five discrete factors, something like that. Those are really good decisions that they can feel very practical. I have to recode these variables because otherwise I can't use this free text field in my analysis. Okay, great. It's practical choice. Highlight that it's a practical reason. But then you have to decide if I recode this variable into five factors or 10 factors or however it is. What made me choose the factors that I chose? Power dynamics. That's a great one. Yeah. I mean, that is a minefield for documentation, but it is, I mean, that's one of the reasons why documenting decisions is so important. Okay, so some great answers here. How frequently do you think you should document the decisions you make? And I've given some answers here and this is really just gut instinct. Should you be doing this daily, weekly, monthly, immediately after making a decision? It depends on the context because frankly, what in life does not depend on the context? Yeah. Daily. Ooh. Ooh, we've got some keen documenters. It depends. Yeah, I'm with you on the weekly. Okay. Okay. Okay, again, there is no right answer to this and it probably depends on your specific research. Immediately after making them feels like a right answer. But in fact, that will be very hard to hold yourself to in a consistent way, because we're not always aware that we have made a decision after we've made a decision. So it's probably somewhere between, I would say immediately after making them and then maybe daily or weekly or monthly depending on the pace of your research that you kind of check in with someone else on your team. These are all the decisions I think I've made. Can you spot any that I've missed? You know, so there's kind of a back and forth. There's not a purely like always do it this way. That is not how life goes. Okay, so yeah, monthly. I like monthly. Monthly seems like very doable. And if it's research that you've done a lot, like you've developed a particular process and you've applied it to one group and it worked out well and you made a couple of little changes, applied it to another group, worked even better. No further changes applied it to a third group. You're probably reducing the number of decisions that you're making regularly, at which point you can probably space out the frequency with which you sit down consciously to document your decisions. So yeah, no right or wrong answers here but I'm glad to see that everyone is so keen. This will be rewarded. Okay, and here's another one that you're probably aware of, and it's, it's a bit of a shame that the slide has gone a bit squiffy like this. How do you document the ideas that influence your work, because we all have brilliant ideas maybe just before falling asleep. And note to ourselves, you know, note to yourself, do the thing better. But what influenced the idea you know what what caused us to have these new things checklist. And you can read pop science magazines or you maybe listen to podcasts or crazy idea crosses your mind. If you follow like times higher education blog series or the conversation or something like that there's lots of ideas in there that probably influence your work. But I'll bet most of us do not ever cite podcasts and blogs and, you know, the kind of random conversations we have. We don't go into our reference system. Yeah, exactly. I mean, whoever's answer is highlighted in red here is that they don't have a good system for this but they would love to have one. Absolutely true because I mean this is, this is one of the challenges is that we, every moment of our lives contributes to the way we move throughout our life in the future. So you know, whether you go this way or that way when you're walking your dog in the morning may lead you to see someone's doing something that then gives you an idea that you use in your research. And it's really strange to say, you know, to put in your citations. I had this idea while walking my dog. That's probably it doesn't seem very professional. It's realistic, but it is not professional. So we kind of need a way of like keeping track of all of our ideas and sorting them out or reframing them so that people don't think we're crazy for citing our dog as a motivation. That said, put your dog in the acknowledgments of your work if you like, it's fine. So this is, this is an interesting one. Yeah, I personally have a lot of notes I sort of capture. I use tools like Evernote or sort of I guess there's a keep is the Google equivalent and then there's like Samsung notes or something. They let you save websites and take pictures and include that as well, or audio recordings or all these things and they kind of just chuck it into a big pot. And then on a regular basis, I go through that pot of things that I've saved like notes to self. And I sort them into categories, or I throw them out if they're really just, you know, if there's nothing there. So you kind of want to develop a way to keep all the things that cross your mind quickly and easily. And then you need a frequency with which you look at those things that cross your mind and either throw them away or take action on them. This is a little bit, it's known as the GTD method that get things done method in which you give yourself an easy way to just brain dump. But then you set yourself a pattern with which you sort through the brain dump mining for good ideas. And yeah, it could be checklists, it could be ever note, if you're or keep notes or whatever if you're into that could be a document on your, your computer could be, you know, a set of audio recordings on your mobile screenshots and and sort of web links and things. That's a good one because there's so much good structure built into a web link or a screenshot that lets you quickly save that information. And while it's easy enough to do something like that on your own, it's, this is a very difficult one to collaborate with. So you don't necessarily want other people throwing stuff into your brain dump pile, because you won't know what they're talking about. When you go to look at at their picture you like it's just a picture of, you know, a path. I don't know what I'm supposed to be looking at here but for them they were like oh this is that access issues and how wide the pads are whether this seems safe for people. You know, it, whatever their research focus might be. So this is a tricky one and this highlights both the importance of documenting. But the importance of having different documentation systems for different contexts you probably want to document your own ideas slightly differently than the way you and a research partner document collaborative ideas on a particular project. That said, if you get good at the GTD method that get things done method of brain dump periodically sort through the dump. You can implement that with a team, so that you maybe you have a weekly or bi-weekly meeting at which point you say all right everyone let's go through all the bookmarks we've saved to this, this particular shared folder. Let's chuck out ones that are useless let's assign ones that are potentially something to different people and put you know, dates by when they'll they'll decide what to do with that does seem like a lot of work I will grant you. All right, here's another one, and everyone will have different answers. How do you get your data. Do you take videos of people as they move through a supermarket and sort of look at whether they're moving fast or slow or what their posture is you know maybe maybe you're doing research on how people move in a shared space. Do you have sensors set up across the city to monitor air quality. Do you apply to maybe an app, you know and see like how many people are performing yoga every day interviews surveys questionnaires all classic social science examples that's great interviews mostly okay. I imagine a lot of you will be doing interviews and surveys or questionnaires the kind of classic social science methods. Of course, there are well established repositories with the UK data service that have good interview data already prepared in a downloadable file so maybe you get your data by downloading someone else's interviews data. But think a little bit more beyond that maybe what if. And maybe this will be a bit abstract, but in an ideal world. Where would you get your data like maybe some kind of direct brain transfer but that's a bit specious. Maybe it'd be more interesting if you could have maybe a device and app on someone's phone and you would be able to track how often they open to their phone and how often they moved to certain parts of the city or something like that. So you can life stories that's an interesting one. Yeah, maybe maybe you get, I worked with someone who was really interested in lost objects. So her data was photographing things that were like in the street. Maybe the unusual kind of data source maybe compared to what you're used to but in her case it was, yeah, images and objects, but objects and images of objects that had been lost in the street. Public primary secondary data collection books journals websites interviews surveys observation. It's great so this this brings up a good point primary and secondary data, and you don't have to. I'm pretty much more responsible for documenting how you got primary data secondary data has a different kind of documentation you kind of just say I downloaded it from this website on this date. Or, you know, I accessed the Twitter API using these search terms, and this kind of access level. And I got so many results, you know. There's a clear distinction here between how much effort and sort of mental clarity needs to go into well documented primary data collection, as opposed to secondary data collection. Research papers focus groups transcripts questionnaires apps websites government publication that is a lot of data and if you are using all of those together in one project, I do not envy you your method section. But it's good that you're aware that they all have different provenance and that they need to be documented. So, in this case, and it's a little bit. Yeah, how do you get your data it's quite an abstract question in some ways, especially in the reproducibility aspect because things like interviews or, you know, shadowing someone or sort of observing a particular activity and what happened. This is not something that can be reproduced. So we need to make sure that we document properly, all the data that we have how representative it is maybe, you know how we got it and why we made all the decisions. In this case, especially for primary data collection, absolutely justify document as much as possible secondary is much easier and that's why I mostly work with secondary data. So here's a good one. How do you document your data acquisition. So if you do surveys, for example, have you ever stopped and put down on paper. Your process for finding people to to interview. Or if you take pictures of objects that are found in the street. Do you record your walk, you know, on some kind of like pedometer app and you know geotag your date the pictures or something like that. So there's some some very complex concepts here. And you don't have to be honest if it's too uncomfortable to talk about how you document your own data acquisition. Imagine a theoretical research project or think of a research project done by someone else and think about how they data documented their data acquisition because I realize it's quite uncomfortable sometimes to be confronted with the fact that I never wrote down how I got those survey participants. Maybe it's just advertising with a 10, 10 quid voucher shopping voucher. Okay. So for data analysis is question has been submitted. You have to document straight away otherwise you can't remember exactly what you did too many tiny nuances. This is absolutely true. Yeah, this is I mean this is probably why we're struggling to answer this question about how you document your data acquisition, because yeah it's really near impossible to think back like three months to when you did something in trying to write down what you did. So yeah this is in this case this is a real clear thumbs up for the idea of documenting frequently and immediately. In academic research. Is it okay to reference podcasts or videos. I would say absolutely. It's no. To my mind it's not fundamentally any different than referencing. Like a magazine article. They're a bit pop culture, but that doesn't mean that they're not interesting and useful sources of information. So yeah, I would say if you're using a podcast or or a video or something like that as as an influence on your research. Yeah, go ahead reference it. If you're seeing it as a data source, absolutely reference it in fact, maybe record a save a file of it in a cloud repository so that other people can access it kind of depends on what you're using it for if you're influencing your ideas. Yeah you can reference it in the citations list if you're using it as a data set. Yeah really very clearly you know save recordings of it save where you downloaded it from and on what date, you know put it in some kind of cloud repository so that other people can access it. Yeah, I've got another suggestion here the ethics application and ongoing documentation in the method section. Absolutely the ethics application will be a very useful thing for documenting how you got your data, you know, because you have to deliberately think about who am I inviting to participate. How will I manage their participation, you know, am I doing this in a way that is consistent and safe and comfortable and things like that. So yeah, this is, I think the lack of answers here is a little bit telling on yeah how little tools are given to us when we start out doing research on how to properly document workflows. So let me, I think I'm about to go into some of the tools that I use specifically. Okay, so just as a summary, well documented workflows can include, depending on your research, the details of your data, including the source, the volume, any descriptive statistics, representativeness, these kinds of things. The data processes, how you stored your data, how you recoded any variables whether you linked more than one data set and if so how you linked them, whether you anonymized the data, any analysis you ran things like that. Step by step of any experiments you ran or observations you made or models that you built and changed. The materials of any materials, software, etc that are used the digital resources if you have them so if you have the raw data files, or final data files, or synthetic data versions. If you have code written you know the status do files things like that. If you have our files, and also the justifications for any decisions. These these might be reference lists so you might say so and so 20 years ago made his decision this way and I'm going to steal it exactly just use his same decision theories frameworks written explanations anything else that's relevant for your work so the ethics applications for example. So that's all well and good to say, oh yeah, document all of this, you know, do your best, it'll be great. But that's difficult. So let's talk specifics here. One of the major tips are make conscious decisions as early as possible, and it's very difficult to make conscious decisions so you might want to have like meetings in which you double check the decisions that you think you've made with your research partner or supervisor or you know even just a buddy who's working on a different project and helps you sanity check what you're doing as early as possible so write them all down check in on a regular basis. Be realistic, we will not be able to document absolutely everything we do, unless we have a camera following us around the whole time, big brother style. You know, there are subtleties to how we make decisions how we incorporate new information how how that changes our mind that we cannot really share. That's our best, but ultimately, you know if you're writing a recipe for the technical challenge of a bake off have so you have to decide what to include and what to leave out. You know you might say include two eggs, you don't say into include two eggs with exactly this weight and color and from this farm. So this is probably the trick that I think a lot of you are talking about tips on actually automating things. And that's great I'll get into that. And here's here's some of the things that I want to talk about specifically, including our major automation so if you're manual citations and references, do not do manual citations and references. Beg you not to do this manually. First of all, it is a nightmare for metting styles and double checking to make sure your reference list mass matches with everything that you cite in the paper and then back. You are much better to use reference management software. My personal favorite is big tech, because I write papers in latte. But I have used mentally. I find it very easy. I have used mentally and end note together if I'm working in a word document. And Zotero is a bit like mentally as well. Mendeley and Zotero what they do is it's a sort of, you can get a little button in your browser bar. That lets you save a website or a paper or a PDF or something like that that you found through a browser to a repository of references, can then export a list of all those references and put it into a document like a word document. And then as you're going through you think, Oh, this idea, I got it from that paper. And you can click to add a reference, a citation from your supported list. If this is interesting to you, if you want more clear specific workshops on how to use like citations and reference management software, do let me know. I will happily talk through my personal favorite big tech, as well as things like end note which are maybe a bit more accessible for people that use word. Now another version and we're all another concept and we're all guilty of this is when you save files under different versions. So you might have like draft one draft one underscore to draft to draft to final draft to final with edits draft to final with edits and new, you know, images. This is terrible. It will clog up your files. It will be a nightmare. He will never remember what's going on. And if you're working with anyone else they have no idea what's going on either. A better way to do this is version control software. And you can explicitly upload of like the final version for a day, whatever day you're working on your project to version control software like git or SVN or something like that. But even Microsoft Office 365 includes version control software. What this means is you can roll back whatever version you have open your one file can roll back to previous previous versions, sort of like tracking changes, but without keeping separate files for each change. And save you so much time and effort, you can see who has made what change and on what day, and, you know, how that had an effect on the length of the document all of these things, and you do not end up with 65 versions of a document that have weird date formats and like final versions and things added at the end. Another one to avoid at all cost is emailing files. I beg you, please do not email files. This might sound like bad advice, but what you are much better to do is load them to a central repository, and then you can email access to that repository. Now, this will come up against GDPR issues at some point. So you have to make sure that the repository you use is secure. If the data that you're storing there or anything that you're working with is at all sensitive or personal. So I think there are academic versions of Dropbox that meet the GDPR standards. Google Drives is probably okay for low level sort of sharing files around or if there's no issues of personal data. SharePoint, if you're working within the sort of Microsoft environment and your university or research institution offers it, SharePoint is good. Do not email files. I will get off my soapbox. There is plenty of time at the end for more back and forth. But I just want to, before we go on to further point out when to stop documenting. So you do not document or share anything that you do not have a right to document or share. Obviously this means you can't document what someone else is doing if they have not given you permission to do that. You're not allowed to share personal information on people. So if you're working with sensitive data, you have to recode it, anonymize it or create a synthetic version before you can share your files. You know, something like that. Do be careful, but ideally this should be covered in any ethical applications that you've done. There are courses, the UK data service runs courses on safe researcher training that helps you identify whether the tables or graphs that you're using accidentally disclose some personal information that you shouldn't be sharing. So you can take those courses if you're working with sensitive or personal information and you want more information on how to be careful there. So maybe if you, if you can tell me some of the valid reasons not to document or share something in the research that you've done. So maybe was there, you know, you're not allowed to report rare medical conditions. You know, too frequently because it, you know, it would be identifiable who had, yeah, so sensitive data essentially you're not allowed to share, you know, lists of people and say we had so many people who reported themselves as Christian and so many people who reported themselves as Muslim and one person who recorded himself as Buddhist and then for the rest of the documentation I'll break things out by religion because that Buddhist will be really identifiable. Non disclosure agreements also very important. Yeah, you do have to follow the rules that you set out for yourself when you research. So if you've signed a non disclosure agreement, make sure you understand it very fully, but make sure you understand it very fully before you sign it and then abide by its rules. Yeah, so there's there's certainly personal data non disclosure data. So there might just be an embargo on something you're not allowed to share yet, like you have to wait six months, or you have to wait until a thesis is published before you can share, you know, the data sets or something like that. There are top, you know, basically just practical kind of time reasons that might limit how you document or share to be aware of those. But more importantly, and I think this this might be encouraging for you stop documenting when you're spending more time on the documentation on documentation than on the actual work. Now the amount of time you spend on documentation versus work will shift over the course of a project you might do a lot more documentation at certain parts of the project than you will at others. But if you find you're spending more time on documentation than on work, it suggests the balance is off, and there's some things you can do to improve it. So automate the boring stuff. Now this is an encouragement to you to learn how to do some basic programming or some basic sort of or get a friend that does some basic programming and help and get them to help you do some some automation on your project. So for example, if you have, you know, surveys scheduled on so many days, and it's in your calendar, and you want to record when those took place and how long they lasted, you could copy them out by hand from your electronic calendar, or you could export into, you know, an Excel file or something. One of those is going to be a lot more work than the other, especially if you're doing this every week for months. Use better tools or processes. So this is something that you might have a process that works really well for one project. So if a project scales up to something that's bigger or longer or with more people involved, certainly the process that you used to use is no longer appropriate. So if you're working in a team of two, you might just call the other person on the phone and say, When is our meeting for this week? How about two o'clock on Tuesday? That's really fine in a team of two. But if you're in a team of 20, you cannot call all 20 of them and, you know, ask about meetings because you'll end up having to call them all back. So what you want to do in that case is get one of those little scheduling, like find a date in common surveys, you know, doodle polls or whatever they're called. There's a better process that suits coming to an agreement between large numbers of people. So if you find yourself climbing the walls with annoyance because it's so difficult to get something done, that's an indication that there's probably a better process out there because the process you're using is not at the right scale. Scale back a risk of the project. If you're trying to document, you know, an incredible amount of detail about 500 people, that might just be too big. You can probably document an incredible amount of detail about five or 10 people, but not 500. So re-scope the project or, you know, downsize the amount that you're trying to detail about these people, something like that. If you're just up the wall, you know, losing your mind because this is all very difficult and you don't know what you're doing and things keep going wrong and you keep losing your files. Get some help. And in fact, the computational social science data drop-ins that we do is a good source for that. You can sort of check in with us. You can say, I used to do it this way, but now it's not working anymore. Is there a better tool of process? We're there to answer those kinds of questions. We might be able to point out, for example, how you can use EndNote to work on your reference manager list, save yourself loads of time if you're working in Word. Or if you want to step it up and move into LaTeX, we can talk to you about the differences with BibTeX and how that works for reference management. Pros and cons, why you might want to do it, how much time you might need to spend learning, and how much time you might save by using a different process. So these are useful things. As before, you know, ask people for help. Everyone in theory should be doing this. And the fact that we're not asking other people, how do you do this? How do you think they do this? What's a better way to do this? The fact that we don't know how to do these things well and we're not asking other people how they do them indicates that it's just not getting done, which is a real problem, especially in the context of a crisis of reproducibility. So of course, at this point, I'd like to take other suggestions from the audience. If you have a tool that, you know, tracks your, you know, web searches or something like that, and you want to share it with other people, please let us know. Just so you know, here are my contact details. You're welcome to send me an email, contact me on Twitter. Occasionally I do live streams in which I show how to code something on Twitch. It's not very consistent. I was trying to get back into it again this year. And in that case, that's a useful one if you wanted to see actually people who code as a big part of their job are still rubbish at coding, and it can be quite encouraging me anyway, I'm rubbish at coding, despite it being part of my job.