 Hi, thanks for joining me today. My name is Thomas Mock and I'm going to be presenting a little bit of the history of Tidy Tuesday and how it can be used as a scaffold for a community of practice. So for those of you who haven't heard of Tidy Tuesday before, Tidy Tuesday is a weekly social data project in R. The initial goal was it was designed to help you practice data visualization and basic data wrangling. Really there's four core steps to that. Step one is every week on Monday, a data set is posted with a data dictionary and a related article. Step two, participants pull the data into R. They clean it, analyze it, you know, fit models and plot the data. Step three, participants then post the results and their code to Twitter with the hashtag Tidy Tuesday. Then step four, we all learn from each other within this community. So we're able to share the code, share tips and tricks, and learn from how other people approach the same problem. You can learn a little bit more about it kind of after the fact at the Read Me for Tidy Tuesday, which is bit.ly.com slash Tidy Read Me. For a little bit of history and some data around the success of Tidy Tuesday and kind of its history in terms of data, I have a graph here on the right that is a cumulative tweets for Tidy Tuesday by year. So you can see 2018, 2019 and 2020. This is only tweets that contain our stats, code, plot, graph, viz, data, or Tidyverse as well as the hashtag Tidy Tuesday in the text of the tweet. So you can see that year over year the number of participants is growing and the number of total tweets is definitely growing. So this adds up to about 6,400 tweets across the two and a half years that the project's been going on since April of 2018. Our GitHub repo has about 2,300 stars. There were about 2,000 unique visits last week with 1,582 unique participants within the tweet data set. And this represents 116 weeks of data with 116 weeks of code and analysis and plots that you can reference and look back on. I also have a link to the code that created this graph, and you can get access to all the tweets underlying this data set. So that's great. I mean, Tidy Tuesday is a public data project. It's run through Twitter. It seems to be fairly popular. There's a lot of tweets about it. But really, what is Tidy Tuesday? It's just data. It's just the sharing of data. But it's data that can be read into R in a few seconds with an article that provides context for that data, with a data dictionary that makes it easier to get started with that data and understand what you're working with. With dozens of example analyses with code every week, so you can compare your work against others and learn from both experts and people that are new to the data set. And lastly, it's data with a community of learners and mentors who are self-motivated to explore and share knowledge around it. But again, what is Tidy Tuesday? It's going to depend upon who you ask. For some participants, they copy pasted some code, changed some of it, and they're happy to keep it in their local machine as they're learning. For others, they read in a file from internet, specifically from GitHub, visualized it, and then posted the results in the code to Twitter. They get a lot of feedback on it, and they improve their data vids. For others, they do a live screencast. They run some statistical models and machine learning, push the output to GitHub, and post the analysis to Twitter, where other people can explore it and then see the code after the fact. And other people, they even write full-blown articles or blog posts where they write some code, do an analysis, and then post those results to Twitter that are then shared with the same hashtag. So for all these people, they're still accomplishing the same project to themselves, but they're all approaching it in different and unique ways. So again, I'll say for the last time, what is Tidy Tuesday? Tidy Tuesday is a scaffold for a self-directed community of practice. That's kind of a clinical definition. We'll dive into the different parts, but again, you know, scaffolding, self-directed community of practice, those are the three core parts. In education, scaffolding refers to a variety of instructional techniques used to move students progressively toward stronger understanding and ultimately greater independence in the learning process. So take someone who's learning a new topic along this process, understand both the data or the process itself better, and become more independent as they learn. And this is part of what Tidy Tuesday accomplishes. As far as self-directed, that really just means that each individual is self-motivated. There's no one there, you know, checking their work and enforcing them to do it. There's no one there saying, oh, you have to do this within a time frame. You can approach it whenever it's convenient. So some people do this weeks after the fact or years after the fact, and some people do it the exact day that the data is released. But everyone is self-motivated to accomplish this if they're participating. And then lastly, it's truly a community of practice, where you can define a community of practice as a group of people who share a passion for something they do and learn how to do it better as they interact regularly. This is voluntary and driven by the community. So again, it's just data, but there's a community that's built itself around this and we all share a passion of sharing knowledge around R, learning about new topics, and sharing that knowledge with our fellow community members. These types of communities are often successful through generating enough excitement, relevance, or value to attract, engage, and retain members. So part of this is that the data itself has to be interesting or the topic has to be interesting, or the act itself learning more about R and how to be a better data scientist or statistician or R programmer. This leads to success. And then lastly, you'll kind of note that this presentation is not a story about me, but rather the Tidy Tuesday project, really the community that has adopted it, extended it, and made it its own alive environment. If we think of the building blocks of cultivating communities of practice, there's a nice book by Harvard Business School, and I've got the link here at the bottom, and they have seven core topics for building successful communities of practice. So creating a rhythm for the community, designing for evolution of the community, opening a dialogue between inside and outside perspectives, inviting different levels of participation, so both newcomers and experts, develop both public and private community spaces where people can share and interact in different ways. Definitely a focus on value, like what is the end user or the participant gaining from participating in this community. And then lastly, the combination of both familiarity and excitement. So something new, something old, something to learn. We'll break down each of these different topics with some examples of what the community has built. So step one is creating a rhythm for the community. So what this really means is that there's a consistent timeline. So for Tidy Tuesday, you can know that every single Monday morning, there will be a new data set. It will be uploaded to GitHub, and there'll be a tweet announcing its presence and kind of its release. So every week we say the R for DS online learning community welcomes you to week number N of Tidy Tuesday, and we're exploring something new. In this case, we're exploring Caribou locations this week, and you can see the tweet in the top right. And then on Tuesday morning, we officially announced that it's Tidy Tuesday, y'all. I'm from Texas, and so I want to be as welcoming as possible, as inclusive as possible, and just really welcome everyone to launching the data set on Tuesday. By launching on Monday, in terms of posting the data on Monday, you have access to it before people actually start to generally create analyses or create plots on Tuesday itself. But again, this consistency means that you can expect it to be there. So maybe you don't have the time to do it this week, you know next week there'll be something new, or maybe it's not as interesting to you. Again, there'll be older data sets that may be more interesting, or future data sets will be more interesting, but it'll always be there. Step two, design for evolution. What this really means is that the community is going to shift and change. And, you know, different people are going to come in and participate in different ways. And you have to kind of be willing to let this kind of happen naturally, and help participants engage with the community. So we can really think of, you know, the evolution of Tidy Tuesday as part of the evolution of the ARFRA data science online learning community itself. So let's take a step back to 2017 where Jesse Mostapak, you know, created the ARFRA DS online learning community. She created initially as a place to read a book. In this case, it was a Slack group based around the idea of reading through the ARFRA data science book as a book club where people could, you know, participate, ask questions of each other as they move through the book itself. She's got a lot of wonderful topics and kind of write ups and both presentations about what she's done. So I recommend seeing the links at the bottom of the slide for more information. But long story short, this community, you know, exploded into much more than just a book club. And it ended up being truly just a place to learn. So learners could come here, engage with the book, engage with the experts, and really engage with people who were willing to mentor. So these mentors weren't necessarily like one to one mentor, but one to a few mentor, where each mentor could answer your question they may have or help you solve a coding problem. But importantly, people were fluid in terms of going from learner to mentor and mentor back to learner as they moved through the book and either new more or new less. And Jesse did a fantastic job about curating this community, making it making it safe and a good place to ask questions. And lastly, it kind of turned into a place to build. So again, Arthur Diaz kept growing and growing beyond just this book club. And Jesse realized that, you know, I think she realized in terms of this, how I'm approaching it was that she couldn't do everything. I mean, she was doing so, so much. And so she was giving people the freedom to build upon the community or expand on topics they were interested in. And this is where Tidy Tuesday came from was part of the mentor group, putting things together as part of this better approach to engaging people on a consistent basis. And this ultimately led to a community where the R4DS online community is still around today. You can find them on Twitter, you can engage in the Slack group, or you can just talk to people out in your own community that are also part of this sub community. So if you think about this retrospective, and I'm pulling some quotes from Jesse's presentation at the 2019 RStudioCon, she realized something, which was that she couldn't do all this by myself. And so part of her goal was empowering people to say, what do you want to see in this community? And if someone came to her with an idea, you know, that's a fantastic idea. How can I help support you in making that a reality? Because again, it's very hard to scale an end of one. But if you're willing to let the community evolve and bring in other people, then you can have these wonderful, you know, offshoots or building up with the community in a natural way, just like you do with traditional open source. So my summary here is that communities evolve, and it's important to let others fill roles, become leaders and engage with each other, really just an evolution of how the community operates. The next step would be opening a dialogue between inside and outside perspectives. So in this case, you're going to have people who know everything about TidyTuesday, they've been doing it for years, and they just kind of go in and do, you know, new techniques on new data. For other people, this might be the first time that they've heard about this project, or they've heard about this community of practice, and you have to be kind of able to meet both people where they're at. So step one, kind of to improve this process was we needed to make it easy. So for, you know, the third first one, this is the very first data set from TidyTuesday in 2018. So April 1st, I uploaded this, and you can see it's pretty minimal. It's just an Excel file, and that's it. So if a newcomer came to here, they're like, what am I supposed to be doing? You know, should I download this? How do I read it into R? How do I get started? So this was not as easy as it could be. Today, we've adjusted this to include raw code to read it instantly into R. So you don't have to download the file temporarily, and then read it in. You can actually read it directly into memory with this code. So every data set comes with code to read it in. Additionally, we have a nice readme. So each data set comes with a readme with kind of a hero image about what we're talking about or, you know, researching some contextual articles, maybe from Wikipedia or from official sources, and where the actual original data set itself came from. So if you wanted to recreate the analysis to get the clean data from the raw data, you could do that as well. Lastly, there's also a data dictionary. So every data set comes with a description of all the variables, the class type, and the variable name, so that when you are coming to this, you know, fresh, brand new data set, you can actually get started with it and understand what's going on with the data set. And then the real last part is the cleaning script. So every week, you know, it takes a little bit of time to clean up the data and prepare it for analysis. This is all the steps we've done prior to uploading it to the community site on GitHub. So you can kind of figure out, hey, why is this, you know, off or why is this happening? It's because it was changed at this level. Now, while that made things a lot easier, you know, you can really make this even a step easier itself. And you can make it easier. And so LS Hughes has done a lot of good work with the Tidy Tuesday R package to make this process even easier. Today, you can install it on GitHub at the bioengineer slash Tidy Tuesday R. But I know that LS is looking to have it on Cran shortly. So the main goal of Tidy Tuesday R is to make it easier to participate in the weekly Tidy Tuesday project. So currently, this is done by assisting with the import of all the data posted on the R for Data Science Tidy Tuesday repository. So Tidy Tuesday R provides functions to let you load the read me the data and basically all the necessary information into your R session or your studio session, all by just passing a date or a week plus year. So you could give like, you know, today's date. And that would, you know, find the data set for this Tuesday, or you could say, you know, week 12 of 2019. And it will pull in just that data set. You can also find a list of all the data sets that are possible. So again, it kind of makes it easier where you don't have to go to GitHub and kind of look up the data sets and forgot what's going on. You can just operate just inside your R session as you're used to. Additionally, for some people, maybe they want to look at an older data set and they want to find how other people approach something. Tidy Tuesday dot rocks is a shiny app written by Neil Grantham. And this is actually a collection of all the past Tidy Tuesday submissions. So all the tweets with the graphic or the plot that they've made, as well as the code that they've used. So with this, you can filter by different data sets, you can choose a different weeks of interest, or you can filter by users. Maybe you want to look at, you know, Dave Robinson's post, or you want to look at your own post that you've made, you can use this to go back in time to find these data sets and compare them against your own analysis. So this is a great resource for, again, people who may be new to the community and are kind of outsiders looking in. For the insiders who have already been there, they can typically use whatever method they like. Tidy Tuesday R, downloading the data set, forking the GitHub repo. There's many different ways to accomplish it. But again, making the kind of hurdle to get started lower is very helpful. And this kind of leads us to the next part, which is you need to invite different levels of participation. You want to have a mix of people who would be considered learners and some people can be considered mentors or experts and novices. These different kind of melding of expertise allows people to both learn from each other or learn from different domains. You know, I can be an expert in agricultural science and know nothing about chemistry or vice versa, or know nothing about, you know, selling advertisements and understanding that versus, you know, mathematics or like. So all sorts of different kind of merging of both domain expertise and actual skills in the specific data science language. So again, R4DS was kind of focused on the idea of mentors and learners and learners and mentors and kind of the shifting back and forth of the different roles where people began to learn and helped others, just behind them up and people who were more experienced again helped each other out throughout the process. And we adopted the same idea for Tidy Tuesday, which is we welcome all newcomers, enthusiasts and experts to participate. So regardless of your follower account or kind of your expertise, please participate and kind of learn from the community and age. And again, regardless of your follower account, the hashtag TidyTuesday gives you an instant audience. So even if you're just getting started with Twitter, you can use the hashtag to find other people who want to engage with you in this community of practice. For this, this means that maybe you're learning new skills or if you're an expert, expanding your existing skill set, getting exposed to new packages or a package used in a way that you didn't know how to do before. Additionally, you can share expertise so you can kind of help guide other people to making, you know, better choices with data or better data visualizations or better use of modeling and keep yourself fresh that way. And as Dave Robinson said, it's kind of like a way to solve a never ending stream of novel problems. So there is some enjoyment here in terms of always being challenged with something new each week. There's a bunch of other different ways to participate or learn from each other. So Jake Calcutt is a great participant and one of the mentors at R4DS. He actually went through and did all 52 submissions for 2019. So every single week he has a dataset cleaned up, visualized, and all the code necessary to do it. You can find his repo down here at github.com, Jake Calcutt slash TidyTuesdays. So that's a great way to like look back at someone who's done a great job and they're really an expert in their field and looking at how they approach different things. You can also learn by watching an expert on, say, a weekly TidyTuesday screencast. So both Julia Silg and Dave Robinson do weekly screencasts with the TidyTuesday datasets. Julia usually focuses on kind of the TidyModels ecosystem where she works here at RStudio in kind of incorporating different techniques from TidyModels into analyzing new data from TidyTuesday. Dave kind of does a similar thing which is, you know, exploratory data analysis, maybe some modeling and data visualization, but really just diving into the dataset with fresh eyes. So he never looks at the dataset ahead of time and he really just does one hour of uninterrupted fresh approach to a dataset. This is really cool because you get to see kind of an expert in motion about how they approach new problems with their existing techniques. They both have YouTube channels which you can find here on the site or they have a blog or a capture of their, you know, the code that they've used as well. You can also learn by watching other experts walk through code and explain it. So Patrick Ward and LSU's also run something called TidyX. You can see their channel at the bottom of the screen. In this example, they go through, find a particularly engaging visualization or analysis from that week's TidyTuesday and dive into the code and help explain why these decisions were made. So they're really, you know, rather than showing you per se, like how you would do it, they're explaining how someone else did it. And this is really helpful for understanding some code because if you look at, you know, 200 lines of code, it's sometimes hard to really internalize it and they do a great job of diving into the code, explaining what's going on and alternative techniques as well. Another really cool example, especially for people who get slightly overwhelmed with seeing large amounts of code all at once, are these flip books, which are put together by Jenna Reynolds. So for these, she has a combination of presentation style, where on the left you have the raw code and on the right you have the data visualization. But because it's done through Schrodingen, it actually lets you build up the plot piecemeal and like chunk by chunk. So in this case, it'll go through and add each different line one at a time. You can see how the data visualization actually changes over time. So you can see, okay, here are the different steps they took and why these things occurred and how that affected the data visualization. So this is another great opportunity to understand why or how something occurred. So while there are a lot of different ways to kind of learn from this community and obviously you saw, you know, five different ways for people to engage with it beyond just posting the code and their kind of visualization to Twitter, it's also really important to develop or at least have public and private community spaces. So the idea that maybe not everyone's comfortable sharing on their personal Twitter a bunch of data visualization or some code they're learning. So it's important to have alternative, you know, venues for them to learn from each other. What this looks like in the past, you know, was people doing live in-person, you know, user groups or hacky hours or little fun social coding events. One of my favorites that I saw back in 2019 was Alison Horst actually put together a hacky hour based around Tidy Tuesday. She got some of her data analysis students and stats students together. They would go to usually like a restaurant or a brewery, hang out together and do some live coding on the Tidy Tuesday dataset. So learning from each other, engaging in like a physical community as well as with the larger virtual community. More recently, because of the COVID epidemic and pandemic, a lot of different things are moved online. So the Boston R user group has actually done a few different Tidy Tuesdays on Zoom. And this actually has been exciting because it opened it up to a much larger population. You think about, you know, you're able to get people on entirely different continents or people in different time zones or physically, geographically separated, all participating in small groups around the same dataset. Additionally, there's obviously the R4DS Slack group itself. So you can join the R4DS online learning community on Slack, where you can ask questions, answer questions, or just engage with the community by going to the r4ds.io slash join link. And this will help you create an account and get started with the community. Beyond just engaging there where there is actually a Tidy Tuesday channel dedicated to, you know, talking about Tidy Tuesday, Cedric Sher, Alex Cookson, and John Harmon have also done a great job of putting together virtual meetups. So this has been, you know, a semi-consistent way of like, again, doing some of these, you know, YouTube groups or Zoom meetups or other groups where you're joining together as a virtual community and engaging around these datasets. But again, maybe it's a slightly more private than posting directly to your Twitter. Additional Tidy Tuesday venues, and I've got a bunch here on the right, is the idea of kind of extending or expanding your existing community. So there's been lots of Our Ladies groups that have used Tidy Tuesday for workshops or Tidy Tuesday for, you know, a user meetup, a working group to learn about text analysis or the like. And a really interesting one I saw was one from Dr. Stephanie Spielman, who for her class actually gave her students extra credit for allowing them to participate in Tidy Tuesday. So for each participation they did, they got some extra credit. And this was a way to extend the lecture with free exploration. So the students were able to engage with both the larger RStats community and, you know, the larger statistical programming community, engage with new datasets, learn new skills that weren't kind of as structured or as tailored to a course or a classroom. And what this really breaks down to is just having an additional small group where you can work together on a shared problem. I've seen this play out in, you know, other industry examples, you know, people in data science organizations where they're learning from each other and sharing skills as they grow together. And I've seen an industry, some are using this as a framework for interview testing or for as part of the interview process to say like, hey, we'll give you a Tidy Tuesday dataset, you do some analysis and return it back to us, and that'll be part of your interview. And that kind of leads me into the next part in terms of the interview in Tidy Tuesday. And maybe this leads to some type of career opportunity for you or a job. So it's really important for there to be a focus on value. So you do have to get something from the community. Verle Van Son had a really good blog post about the 10 reasons why she loves Tidy Tuesday. And this is really like 10 reasons why she feels like she's getting something out of Tidy Tuesday. A lot of this is, you know, getting inspired by the creativity of ever discovering new functions or packages and how they're used with data, learning to do better data visualization by making plots self-explanatory, practicing existing Tidy for skills or learning new ones, really learning how to fine tune graphs and get them publication ready, obviously discovering new data every week. And maybe it's really interesting and it's something you've always wanted to analyze, but didn't know how to get it. Being part of the larger R stats community. So not only you engage in with Tidy Tuesday, but this kind of opens you up to all the wonderful people on Twitter that are part of the R community. Additionally, again, exploring those interesting data sets, so going in and diving into them, you know, really doing a deep dive with a lot of other people, trying out different or new plot types. And lastly, kind of a joking one, getting inspired to tidy up your own home. So a lot of people actually use the Tidy Tuesday hashtag to, you know, advertise for like Murray condo or other types of like cleaning up of your own home. Additionally, in terms of value, there's the idea of kind of building your career through projects or public examples of your expertise. So Jacqueline Knowles and Emily Robinson wrote a book recently called Data Science or so building a career in data science and then a nice quote in here specifically about Tidy Tuesday. This book is available online. You can go to bestbook.cool for the whole book and then I've got the sub chapter as well. But reading the portion, you know, learning on your own means that you don't have a teacher or role model. And the best way to counteract this lack of a teacher is to find a community of people where you can ask questions. One great example is the Tidy Tuesday program. And this talks about, you know, data scientists coming together using R to tackle data science problem. And as a larger part of this, the R4DS online learning community also, you know, solves this solution of finding community people where you can ask questions. So this was a great example of highlighting some of the value. Additionally, Dave Robinson talked a little bit about Tidy Tuesday in his recent presentation at the 2019 RStudioCon titled The Unreasonable Effectiveness of Public Work. And what you can think of here is he used to think about his goals here in the top left of, you know, things got more valuable as they got closer to a published paper. When in reality, really anything that's still on your computer and only you know about is less valuable than anything out in the world. That could include a published paper, but maybe it's a blog post or open source contributions or a tweet or some type of data product that you're sharing with others. So this free sharing of knowledge and kind of proving your skill set is very valuable for individuals to kind of get noticed. And Dave said that one of the most remarkable projects that enables exploration of new data is Tidy Tuesday. Share what you can learn with it. And these are things, these are data products out in the world that again highlight that your expertise are lying to share and gain knowledge within a community. And then obviously maybe you do just get hired out of Tidy Tuesday. So Z Yang had a lovely post here where she walked through creating a D3 tree map using the D3 tree R package. And as part of this, she actually found out that she got a job interview and potentially an offer because she made this tree map for Tidy Tuesday. And so she said, you never know who's watching you to get calls from those who would like to hire you. Keep up the good work and continue sharing with us. So, you know, again, these public kind of displays of knowledge and kind of sharing knowledge within a community can help you lead to jobs as well as just knowledge gaining for knowledge's sake. And then just kind of one other data point. There's some ongoing research about the Tidy Tuesday project itself. So kind of pun and cheek here is learn by learning about learning. So in this case, some researchers at the University of Tennessee and Thermo Fisher Scientific analyzed the Tidy Tuesday project and some of the tweets there and found that participation over time and longer code was typically related to being recognized by others. So by participating in the community, they either, you know, expanded community or at least got more likes and retweets on some of their publications that they're putting out on Twitter. The last part is the idea of combining familiarity with excitement, you know. So things you know how to use, you know, you're starting with R, you know how to use different things in R or different packages. But you know, the excitement that comes with a new project and something short term that you can actually accomplish in, you know, a day or an hour or whatever. And specifically something novel. So you don't really know about it or maybe it's something interesting, but you haven't seen it yet. So this excitement along with the familiarity of your tools. And this really breaks down to you're using the same tools, but you're implying them in novel ways. So if we look at just a small handful of the 116 datasets that we looked at, we did things like Star Wars characters, park safety, New York restaurants, online news headlines, horror movies, number of birds that collided against airplanes, USA births by year, NFL attendance, dog names, anime characters, TV show ratings, cat and dog popularity within different states, prisoner populations, military brain injuries, federal research and development budgets, voter turnout or Thanksgiving food popularity. So while some not all these data sets may be interesting to you, they're all probably new. And you can probably find one that is interesting to you and you can apply your skill set to it. So this brings us to kind of the end of the presentation. And again, I've asked this multiple times, but so what is tidy Tuesday? Again, it's just data. But it's data that can be read into our in a few seconds with an article for context, with the data dictionary, with dozens of example analyses with code every single week, with a community of learners and mentors. All this comes together to provide you a scaffold for, you know, self directed community of practice. So with that said, it's tidy Tuesday all. And I hope you join us in contributing to this community of practice. Thanks for your time.