 Thank you. Welcome everyone to the session on Teaching R and R in Teaching. We have a very interesting session lined up. I think it's a little different from the previous sessions we've been having in the conference because this is no more about teaching and you will see that all the speakers in this session are united by one common factor which is that passion for teaching art to the students and I think that they're also excellent software developers in their own right as you will see from their pages. So please do check out the speakers. We're very thankful to Rose who's the sponsor for today and also to Absalon who's sponsoring this particular session. So with that let me quickly switch over to introducing our first couple of speakers. It's going to be a video talk by Professor Allison Koss who's an assistant teaching professor at UCSB. He's also an artist who illustrates for data science statistics and education. I really invited you to check out our website where you can find a lot of the art work and we also have Julian who's going to be joining the talk with her. He's a senior data scientist at the National Center for Ecological Synthesis and Analysis at UCSB. So with that I think we'll move on to the first start. So we've got this exactly for 15 minutes. It's a recorded video. I invite all the participants to type in their questions and attendees to type in that question into the Q&A and we can take the question on answers at the end of the session. Hi everyone. Thank you so much for joining us today. We're excited to tell you about our process of developing a data sets based R package to teach environmental data science. I'm Allison Horst. I'm an assistant teaching professor at the Brent School of Environmental Science and Management at UC Santa Barbara. Hi everyone. I'm Julian. I'm a senior data scientist at NCIS where I help scientists to develop for possible analytical workflow. So Julian and I both teach environmental data science which means that we help learners and researchers to gain the skills they need to investigate environmental challenges and questions using environmental data. And we help them do that more reproducibly, efficiently, and collaboratively through our teaching and training. Most of our content focuses on the right side of this image where we try to focus on how to work with and analyze existing environmental data. So you'd think that most of our prep time would be creating materials on things like data wrangling, statistical approaches, and reporting. And that's usually true but it turns out there is a really time-consuming thing that often gets overlooked. And that is the process of finding modern well-documented environmental data sets that are useful and interesting for our teaching. The issue here is not that there aren't openly available environmental data sets. There are a ton of repos full of environmental data of all different flavors. The issue is that it can take a teacher a really long time to locate, download, explore, curate, test, and make a lesson from those data sets from scratch. So when you start considering the different check boxes that would make an ideal data set, like for us that it's real-world data, environmental, open, accessible, has throw metadata and on and on, you might correctly imagine that this can take a lot of time and energy for a teacher. So what we thought would be really useful for a lot of environmental data science teachers would be an openly available collection of curated data sets featuring learner-friendly real-world environmental data that can help students to consider environmental questions and learn data science or statistical skills. And we already have evidence that there would be demand from teachers for this kind of resource. Recently with my co-authors Dr. Allison Hill at our studio and Dr. Kristin Gorman at the University of Alaska, we developed and published the Palmer Penguins R package, which contains size measurements like flipper length and body mass for 344 individual penguins observed by Kristin and her colleagues at the Palmer Archipelago in Antarctica. And since being published on CRAN, this package has been downloaded over 140,000 times, was almost immediately recreated as Python and Julia packages and is already used all over the place as an example data set in teaching materials around the world. So this message to us was clear, teachers are craving awesome and readily available data sets for their courses, which made us ask, if this one data set is so useful for teachers, can we expand on it to provide a larger, more varied and more versatile collection of environmental data sets with educators in mind? And luckily, we weren't starting from scratch. We already had an idea of a really good place to start looking for more environmental data sets. That penguins data is from a single site, which is Palmer Station, Antarctica, that is part of a network of 28 sites that comprise the US Long Term Ecological Research Network or LTER. The LTER was founded in 1980 and over the past 40 years has collected and shared over 7,000 unique data sets that cover vastly different ecosystems, spatial and temporal scales and wide-ranging topics from marine biogeochemistry to urban heat islands to penguin sizes. And in addition to collecting, studying and sharing both monitored and experimental data to understand long-term ecological processes, the LTER network's mission also includes education, outreach and a goal of quote, creating well-designed and well-documented databases, which is music to our ears because basically the LTER is this amazing treasure trove of nicely documented, openly available environmental data sets that we knew would be a great resource for this project. Knowing that we could find really useful environmental data from the LTER network, we set out to make an R package that featured one data sample from each LTER site. We wanted this to be a team effort, so we sought input from LTER information managers and the LTER education team. We wanted this package to focus on being a resource for teaching and learning data science skills, not necessarily to answer research questions. We wanted to ensure that everything was completely documented to make it possible for users to access any raw data and metadata. And we also wanted to make sure that we were using best practices for data management, documentation and sharing, which is such a core part of the LTER's mission. So now I'll hand it off to Julian to help tell you a bit about how we actually did this. Thank you Alison. So from the beginning, we wanted to establish a reproducible workflow to ingest content into the LTER dataset package. We know that most of these data are long-term time series, and we wanted to be able to update this package in a few years from now. We also had the advantage we could rely on the good practices of the LTER network in terms of data management and existing infrastructure used to manage their data. So here we are trying to capture the life of a LTER data sample. So it's starting the field with scientists making measurements that are sometimes in tricky condition, and also a treasure of a different automated sensor that collects data continuously. And all of these data end up at some point on the table of the information manager that will be taking the time to curate those datasets, document those datasets and upload them into a data repository. One of the main LTER data repositories is EDI, Environmental Data Initiative, which is part of a larger federation of more than 40 data repositories called data one. And data one aggregates the metadata from those values data repository to help data discovery, but also provide a set of API and R packages to help to interact with its services. So once we have selected a dataset that we thought was of interest for our teaching project, and as EDI was mentioning, there was a lot of trial and error there, we used the R package method to import both the data and metadata in our environment. And from this original dataset, we created a smaller data sample that was easy to understand, to describe, and to manipulate for teaching purposes. And based on those examples, we also adapted the metadata of the original datasets to this specific sample to be able to describe it. And then we wanted to build examples of what could you teach with these different samples. And that would be made through vignettes of the package. And finally, we wanted to go beyond our users. So we wanted to have a website where any data science instructor could be able to access and engage with the content. So if you have a view a little bit more in-depth of our R package, on the left, you can see maybe the kind of usual structure of our R package and the different folders and files, but we like to think about it into four main categories. One is about sample, creating this data sample. We have a script that does that, downloading directly the data from the data repository, and saving the dataset. Then there is all the documentation phase where we adapt the original metadata into the specific one for the R package. And that is all this demonstration of what could you do, what could you teach with this dataset to help people to think about it. And it's done through the vignette and our markdown. And finally, the package itself is a really great way to share the datasets or the data samples more precisely, but also having this website where we can expose that beyond our user community. So once we have our structure there, our workflow established, we wanted to reach out to create the samples and the examples we wanted to have for the teaching part of the package. And since it was a teaching project, we thought why not try to transform it into a teaching experience developing this package. So we reached out to the UCSB data science program who was looking for capstone mentors. And we set as a goal for the students to expose them to real data manipulation and R package development. We were able to onboard five talented students on our project to work with us from January to June to develop those samples and examples. And during this process, they really experienced and developed their skills in programming. And I will look at that in maybe more, you know, the data science side with Tideiverse and Markdown, but also the collaborative side with GitHub and code review process as well as more software development through the use of DevTools, use this and also GitHub Action. There was, of course, also a big chunk of data management skills developed or to search data on a data repository, leverage the metadata that exists, but also produce metadata on your own. And finally, all the web aspect of the project about, you know, exposing the content and developing more the content on the website, as well as customizing it using CSS. So currently, we have 19 data samples that have been created. We are correcting the vignettes to make sure that the package itself is coherent across the different team of data science who would like to be able to teach with these various samples. Here, you can see a matrix where I currently using to try to do so and highlighting where we have gaps and try to find new data set that will help us to fill those gaps. And maybe more concretely, an example of one of the vignettes here for the North Temperate Lakes ice cover. This sample is looking at OIS cover duration, a potential indicator of climate change on air temperature evolving over the last 150 years. Developing this data sample on vignettes required merging two different data sets with different time steps and developed values that are visualization and analysis to investigate trend on the relationship between these two variables. Now I'm going to, I need back to Alison for some final thoughts. Thanks, Julian. So while this is still a package in progress, we have a few major takeaways that we wanted to share. The first is that modern curated versatile and well documented data sets are in demand and very appreciated by data science teachers. The second is that our packages are a really user friendly way to share data sets with other people who are teaching or learning in R. So for example, in environmental science and ecology, a lot of people use R as their primary coding language and having it in a package makes it really straightforward to quickly access that data and have the documentation there with it. And the third thing is that creating a data sets package is a great project for students to learn and combine a lot of different skills into one meaningful project. One student on our team, Aditya recently shared an email that said at the start of winter quarter, they had no idea how to even begin developing an R package or how to collaborate with others on GitHub and that they learned so much during their time on this project. So we really think it's a valuable and useful learning experience and creates a valuable product for data science education. With that, we would like to thank all of the people and groups who have been supporting this project. In particular, we would like to thank the folks from the LTER network, including Marty Downs, site information managers, and the education team for sharing their ideas and feedback with us. Huge thank you goes out to the five UCSB undergrad students, Aditya, Karen, Leah, Sam, and Sophia for their hard work on this project and to their capstone advisor, Dr. Sung Yoon Oh. We'd also like to thank all the developers, maintainers, and contributors of the R packages that made this project possible. We have shared a link here to a sampler version of our package at github.com slash LTER slash LTER data sampler that we would love for you to try and we would really welcome your feedback. So thank you all so much for joining us and we hope you enjoy the rest of the conference. Yes, Allison and Julian. There is a very nice talk. Very interesting. I think I'm sure nice introduction to both the data set and how one can work with these kinds of data sets. There are some interesting questions. So let me read them out. The first one is from Adamus Trisha who teaches in a medical school and has had the problem of finding a suitable patient data set. So have you written your process up perhaps in an article so that they might follow in your footsteps with the data set that relates to the students? Yeah, thanks for the question, Adamus. So we have not written up an article on this but we are really grateful. I won't speak for Julian but I'm really grateful for so many open resources that others have shared about creating an R package, including creating your first R package so I'm happy to share some of those links in the chat. But I think maybe I'm wrangling Julian into this on the fly but it would be really great to write up a blog post about our entire process from start to finish to help other people but also so that when we look back on this later on that that's a resource for us as well. So yeah, we would really like to document that and when we do we'll make sure to share it. So Julian, do you have something to add to that? Yeah, I was just going to add that the LTR network has from the beginning a specific data repository where they have to upload actually their data as a mandate of the agency that is funding them. So it makes really is a pioneer a lot of metadata and data archiving processes as a network so that was also our advantage is we could leverage this infrastructure that existed so we knew where to start at least to search. So we didn't have to read the papers and find the data set we could go to this data repository and get the data set and actually even zilling to the potential articles that are using these data sets so we're having this advantage. So it's one more question from Joel. So do you have any witness on cleaning the raw data so that you know one doesn't just start with the curated data set because updating and cleaning the data is really always a challenging step in any ecological analysis. Yeah, thanks Joel. Yeah, one of the bins that we are hoping to feature in some of the vignettes is for data wrangling and tidying. So for some of the data sets the goal is to not have them be perfectly tidy. So we're trying to build that in to have some opportunities to teach basics of data wrangling within the package so that will be included in the vignettes. The other thing that's really cool is that if a teacher is looking through the package documentation and is like this data set seems so cool but I really wish I could have my students start with the raw data then what we also include is all of the code that we had used to get from the raw data that's on EDI to the curated data set that exists within the package. So any instructor could, if they wanted to add some complexity, they could always go back to the original raw data. They have all of our wrangling steps included right in the package so that they could see how we got to the curated data set. So that's an option too. So there's one more question but I think I'm going to hold on to the end of the session because Xi'an is going to talk about probably a bit about carpentry and this could be a point for discussion at the end of the session. So thanks Alison and Julian for a very nice for the perfect recorded talk and I think we will now move over to the next talk. Introduce Xi'an Tamba who is a lesson infrastructure, the technology developer at the carpentries and he's a trained population geneticist who's worked on population genetics of normal plant pathogens and like others in this session he has a strong passion for creating useful computational tools and user-friendly computational tools in AI. Hello, my name is Xi'an Kamvar and I'm the lesson infrastructure technology developer at the carpentries and the carpentries is a global organization of volunteers who teach foundational data and coding skills to researchers worldwide. What makes us stand out is not the fact that we use evidence-based practices and teaching it's that we align all of our decisions with our core values we put people first and we value all contributions. Our lessons are distributed as open licensed websites built on a consistent style that adheres to active learning principles. We use these lessons as the source material for the hundreds of workshops we run each year and these sites have three distinct audiences. First the certified carpentries instructor who refers to these materials as they teach our workshops. Next there are the learners in a carpentries workshop who rely on these lessons after a workshop as they review and practice their newly acquired skills and last but not least there are the educators like yourselves who adapt these open license materials for their own lessons. The source for these lessons are hosted on github where volunteer maintainers ensure that the lessons are accurate and up to date and we encourage a culture of open contribution where members of our community can suggest improvements like a simple typo fix or a better explanation of an important lesson concept. We want anyone to be able to go to the repository and make a suggestion to improve our lessons. At least that was our intention and I want to pause for a second here and highlight this tweet that came across my feed as I was preparing for this talk. It shows a fork between a paved path and an unpaved footpath across a patch of grass which leads to a crosswalk. There's a sign in front of the unpaved path that says please use the purpose made path provided. The tweet author points out that the sign knows it has lost. This unofficial footpath is called a desire path and it is an important concept in design because it shows the difference between what the designers intended and how people actually use the space and as our community has grown new desire paths were being created across our lesson infrastructure landscape. No signs or in our case documentation would stop contributors to our lessons from stepping outside these complex purpose made paths in our lesson infrastructure and we needed to rethink our infrastructure altogether in a way that is more inclusive and welcoming for everybody and we found that the our publishing ecosystem is flexible enough to give us the tools we need to reduce barriers for publishing lessons and furthers our mission. I will introduce you to our current infrastructure, its unique challenges, our solutions and how we use past and present feedback from our community to iteratively refine our design but before continuing I want to remind everyone of two things. First there is no right or wrong only better or worse. Greg Wilson the founder of software carpentry wrote this after several iterations of the lesson infrastructure and I'm putting it here as a reminder that the infrastructure we have now was working for us at the time and what we come up with will have its own difficulties down the road but what is important is that we build something that better addresses the needs of our community which leads me to my second reminder. You belong in the carpentries. As the community has grown our infrastructure has been put to the test and we have continuously updated our workflows to make it easier for people to contribute. The reason we do this is because we are driven by our values and in the carpentries you belong no matter if you've been working as a systems administrator for university hbc cluster or if you have just learned to write your first r script and to better understand the decisions we are making let's start by reviewing our current infrastructure. Our lessons are written in markdown and transformed into a website via github pages and the geckl static site generator. The idea behind this choice was that it was the most straightforward way to create a static website without needing a server like wordpress or drupal sites. Ideally it also provided a way for people to use this as an example of how they can build their own website. And the paradigm of being able to write markdown lessons to get a functional website out of it is not a new concept. In fact there are 460 iterations of this concept. Geckl happens to be the one that github implemented early on to provide documentation for its open source project and thus it stuck. We created an all in one bundle for lessons that provided styling templates in html css and javascript along with validation scripts in python and r scripts to build our markdown based lessons. All of this was orchestrated by a make file and the purpose of this full approach was twofold. First to maintain a consistent style that emphasized our principle of evidence-based teaching such as learning objectives and formative assessment and second to demonstrate how the skills we teach in our workshops could be applied to real-life situations. And while this was conceptually good in theory, this infrastructure design has three significant drawbacks for contributors. The first is installation pains. Having geckl, python and make as dependencies means that the people who want to build these lessons on their machines need all three of these successfully installed and up to date. This is especially frustrating for windows users who have none of these by default. Secondly, with this design all of the scripts live inside of the repository and this leads to the unfortunate pattern where lessons quickly become out of date as they diverge from improvements that are made upstream. Another drawback is that we have a lesson website wrapped around a static site generator which in and of itself is a kind of desire path. And this meant that it was easy to contribute if you were familiar with how geckl operated. There's often a moment of panic in a new contributor's eyes when you show them what the lesson repository looks like. If you were not familiar with geckl then it was unclear where to even start if you were looking at the get repository because there was no clear sign that marked the trailhead. This led to several lessons to find their own paths that ended up being built in different ways. Over the last few years we have begun finding these desire paths across our lesson infrastructure. These manifested through our communication channels and frustration from contributors and maintainers alike. But the thing about desire paths though is that there are not only challenges but also opportunities and at the Carpentries we like to call these Chopportunities. The growth that we have experienced in the past few years is a Chopportunity and we have seen not only a growth in the number of community members but also in the number of community contributed lessons since we started the Carpentries incubator and our challenge is clear the all-in-one lesson infrastructure does not scale well to the growing number of lessons. Our opportunity here is to reimagine the infrastructure in a way that truly values all contributions. This includes the spectrum from everyday educators who want to share their knowledge to the tinkerers who want access panels to understand what is going on behind the scenes. And the solution we came up with uses R and while I simply do not have enough time to discuss all of the features you can find out for yourself if you visit our blog posts and documentation on how to get started. So how and why did we choose R to create the next iteration of our lesson infrastructure? Well we wanted a general solution where you could take markdown or armmarkdown files, place them in a folder and generate a Carpentries style lesson without having complicated paths or generated files lying around. The first thing we did was to investigate the existing landscape and it was important that we choose something that is user-friendly easy to install easily customizable and has a welcoming community behind it. We tried out several static site generators but the largest barrier for many of these was that they were not easy to install and maintain. We settled on the fact that R is the best place to go for our needs because first and foremost R is full of friendly communities such as R-Ladies, R-OpenSci and R-Forwards. R is also easy to install on all major operating systems and is available online via RStudio Cloud. And last, R has a robust ecosystem for publishing thanks to Knitter and R Markdown. So once we identified R as our solution, the natural place to go was one of the R Markdown variants like Blogdown or Hugo. But we found that while the tools were indeed separate from the content and the documentation was rich and accessible, there were many aspects that would not fit our needs. In particular, the presence of styling inside the repository meaning that the user was ultimately responsible for maintaining the visual presence. We realized that an unlikely contender, package-down, a document site generator, used the exact same model that we were looking for. It meets people where they are, content is pure Markdown with no extra templating required, the tools to build everything lived in a separate package and you can customize its appearance by making your own package to supply your CSS, JavaScript, and HTML templates, which has been done many times for individual and organization-wide documentation. So from this model, we were able to pave our own desire paths in our infrastructure by creating three R packages. Sandpaper is the engine that orchestrates building and maintaining lessons. Varnish is the styling that hosts the CSS, JavaScript, and HTML templates. And pegboard serves as the validator and converter behind the scenes for lesson content. Of course, the first thing any contributor will see is the folder structure and it will help if this resembles the structure of the lesson website. So our lessons consist of folders that correspond to the website drop them menus. One folder for lesson chapters, one for extra information for learners, one for extra information for instructors, and one to contain learner profiles. The final folder is one of the access panels that contains the rendered Markdown files so you can use them in another context and a static website so you can put it on a USB stick and share it without additional software like Hugo or Jekyll. And our solution ticks all the boxes that satisfy our design choices, but if we end up designing something that our enthusiasts love but is unusable by newcomers, Python or Matlab folks, then we are not in line with our value of being inclusive of all. Our goal is for authors to focus on the content over the tooling and we want lesson authors to create lessons directly from the source they already work with, be it Markdown or Markdown or Miss Notebooks. And to do this, we need to test the minimal viable product on actual maintainers as we need to make sure that they were spread across the familiar spectrum of familiarity with using R in the current infrastructure and even the way the carpentry operates. So we recruited a total of 19 volunteers to run through alpha testing, which tested the participants ability to install the required software and packages, create, modify and contribute to lessons. After the tests, I asked for 20 minute open-ended interviews from volunteers about their experience to identify common stumbling blocks, challenges, and bright points. And I want to take a moment to thank everyone who has participated, some of who are part of the Carpentries Core team. And I don't have the time to go into detail about the results, but a big takeaway from this was that everyone was able to install the infrastructure and any problems that occurred were largely from Git and GitHub, which is a big improvement over our current system. So we have just finished the testing phase, the first testing phase. And the next step for us is to address the questions and concerns that were brought up. For example, how do I use this without clobbering the current R installation on my system? We need to improve documentation and get ready for the beta release, which we will try it on a few live lessons to identify pain points for the community. It is a slow process, but this way we can avoid major unforeseen issues. Minor issues are a given. We can bring users in on the ground floor, get valuable feedback and strengthen trust within our community. And I want to conclude by saying that we ended up choosing a solution that we believe aligns with our values and will work with our community. We do not have all the answers right now, but we go through this process because we want to make sure we put people first and are always learning. And none of this would have been possible without generous grants from the Alpha Presons Foundation, the Moore Foundation, the Chan Zuckerberg Initiative and the R Consortium Infrastructure Steering Committee. And finally, thank you to all of our alpha testers, the Carpentries community and all the folks who have taken time to sit down and talk with me about early drafts of the new infrastructure. And thank you for attending this presentation. And I'd be happy to take any questions. Thank you. Thanks a lot, Jian, for quite a nice talk. A very interesting tool. So are there questions for Jian? I request the participants to put these into the Q&A. And so before that, what do you think are the main challenges in getting the users to adopt this kind of thing, preparing their own lessons and tests and R&D and so on. So during your testing phase, what are the major learnings? I guess some of the challenges, are you referring to preparing lessons in general with the tool sets that are out there? Or are you specifically referring to the tool that I was working on? The tool that you were talking about. Oh, yeah. Most of the challenges came largely from connecting to GitHub. That's one of the biggest challenges people have on their computers because they are unfamiliar with the SSH or HTTPS protocols. But there were other challenges that were largely based on my initial instructions on how to get things installed. Sometimes I did not phrase things correctly. And so people were unsure, how do I, I need to download this binary, but what do I do once I download it? So it was very common installation problems. We largely did not find very many problems with the actual operation of the template itself. Most of the problems were just installation pains. So do you think dockerizing it or something would make life easier for installation? Have you thought about that? Yes, we have thought about Docker, but the problem with Docker is it only works on modern systems. So I have a Windows machine in my office that's from 2015 and no longer will update past, I guess past 2018. Docker cannot install on that machine. So it's important that we do support legacy systems because we want to be able to make it as accessible as possible. So while Docker is a good idea, it's not, it leaves people out. So there's one more question. So when can the wider community give it a spin? I am working on that right now. And I think we should have something working, some data working by September when we will be beta testing it on a few carpentries lessons. But if you are interested in testing it out, you can contact me. I'm on Slack and I'm on Twitter at Zcombar. I also have my email at that same address. So you can contact me or you can open an issue in the sandpaper repository. And I will gladly help you with anything that you need to get set up. Zeehan, there are a couple of questions that maybe you could type the answers to those questions. But in fact, the YouTube users would miss out on that. So maybe we can quickly take these questions. So is there a rough timeline for when the switch to the new lesson template will be implemented? And is the plan to apply this across all carpentries and lessons? Yes, the time the timeline is going to be hopefully by the end of the year. We plan to start the rollout. But the old template will be supported for a year after we start after we do the initial release of this template. And we will we will support all the translation from the old template to the new template. And that will be on me to do the maintainers. Ideally, we'll have to do nothing for that. And then can I answer the last question? Yeah. Okay, the last question. Have you considered our end for local library installation per project? Yes. This is one of the things that I will be working on in the next phase. And it's tricky because it's not necessarily intuitive for everybody. But RM is flexible enough that I can build something that should be relatively intuitive and easy to use. So thank you. So thanks a lot, Jaya. And I request you to stay back. We can have some more discussion at the end of the session. So I'm next very happy to introduce Dr. Mine Dojo, who's an assistant professor of teaching at the Department of Statistics at the University of California, Hawaii. She's a unique educator with an interest in statistics and data science education. And she's also an applied statistician with experience in educational research. She has a very interesting book called Bayes Rules. And you could check the book out as you listen to her talk. So this is again a video talk. And Jyoti, I request you to share the video. Hello, everyone. Greetings from California. I will be talking about teaching and learning based in statistics with Bayes Rules. My name is Mine Dojo, and I'm an assistant professor of teaching in the Department of Statistics at the University of California, Irvine. Even though you will only be seeing me today, this work is a collaboration between Alicia Johnson, Manzot, and me. I will be talking about two parts of this project. One part is the book, Bayes Rules, an introduction to Bayesian modeling with ARD. And the second part, which is the ARD package that supplements the book. So in a regular conference, it is easy to see who is in the audience and also get some idea about how they're feeling about the content, whether it's too hard or too easy. However, right now, it's really hard for me to read the room. So I will make some assumptions. So some of you might be here because you're an educator who is teaching Bayesian statistics or who would like to teach Bayesian statistics. Or some of you are here because you want to learn Bayesian statistics. And it's very likely that there is at least one person in the audience who has no idea what Bayesian statistics is about. So what I'm going to do is I'm going to run through a very quick example just to give an idea of what Bayesian statistics is to somebody who has never seen it. And while I'm doing that, I will be using some functions from the Bayes Rules package so that we get some exposure to what is in that package. So let's assume we're estimating pi, which is the proportion of spam emails. Nobody knows what pi is, but we know that it's some number between 0 and 1. It could be 0.3 or it could be 0.6. We have no idea about this. But what we know is all of us who are here have some idea about what pi could be. For instance, I might believe that it's 0.25 based on my own emails I get. For all of us, have some certainty about the pi that we believe in. You might already be familiar with binomial likelihood. So you might go ahead and observe data about 10 emails and you might see that three of these are spam. In such a case, a maximum likelihood estimation would say that pi is very likely, highly likely to be at 0.3. 3 out of 10, 0.3. So let's go ahead and actually introduce some prior information that we talked about that each of us have. So some of you might have had a prior idea that pi is about 0.5, but it could be 0.25 or 0.75 is somewhere here and you're not very certain about it. So this lacks some certainty. Or you could possibly actually be quite certain that pi is a small number and it's definitely not one of the high values. So you could have a prior idea of this. So it tries out that a better distribution is a good prior for binomial. One of the reasons is because better distribution is actually the support of the function is between zero and one. So in these rules package, we actually have binomial likelihood or plot beta functions to help us plot our prior idea as well as the likelihood. So what the Bayesian would do is bring together prior and likelihood and look at the posterior. So a good set of functions in this package helps us plot the prior information, the likelihood and the posterior. This was a quick introduction for those who may not be familiar with Bayesian statistics. So I'll go now on to talking about the book. So this book is actually targeted for advanced undergraduate students in statistics or data science. There that many undergraduate programs that teach Bayesian statistics. If somebody who has equal training as an advanced test student, they can, no matter where they're in their career, should be able to benefit from this book. We assume, at least for our students who have taken our courses, their prior training have been, they've had prior training in statistics. So this is not an introductory course book at all. A reader would require some pre-understanding of statistical topics. And for probability, calculus and tidy words, we use these in the book. However, these are recommended. If somebody was not necessarily strongly trained in these topics, they would still be able to get the major, major ideas of the book and still learn from the book. For instance, if any of you don't know binomial likelihood, you will actually read about it in the book. So why did we first of all write this book in the first place? We know that Bayesian methods are becoming very popular. And part of this reason is because now we actually have the computing power more than we did in the past to be able to fit Bayesian models. Bayesian models were in the past criticized as being subjective, but the scientific world, at least in comparison to the past, re-evaluated their understanding of subjectivity. The popularity of Bayesian statistics is not necessarily reflected in the curriculum, or the resources supporting the curriculum. So when we were teaching our Bayesian courses, we actually patchworked from multiple Bayesian resources to be able to tailor the course content for our undergraduate learners. Since we could not find a single resource appropriate for our students, we decided to write our own. Let me dig a little bit deeper into what is actually in the book. So let's start with unit one. So unit one, we start with Bayes rule, and then we'll want to the beta binomial model that I've briefly talked about today. Once students understand the beta binomial model and how foster is formed, we then try to have students tweak with different priors, different likelihood scenarios, so that they have an understanding of balance and sequentiality in Bayesian analysis. Once they have all that understanding for the beta binomial case, we then extend these to other conjugate families like gamma poisson and normal normal. In unit one, students mainly work with conjugate families. So the posterior is actually quite easy to calculate, but in reality, that's not the case. Bayesian models are quite complex and it's not necessarily easy to calculate the posterior. Then we move on to the cases where posterior cannot necessarily be calculated. We introduce posterior simulation and analysis. We do this with grid approximation and metropost hastings algorithm. We then move on to using r-stand and then students use the simulated posteriors to analyze what these posteriors mean. They calculate credible intervals, they do hypothesis testing, and they also use this posterior distribution for posterior predictions. Once they understand how to use the posterior distribution, they actually move on to unit three, where they start regression and classification models. This unit actually has plenty of regression models. I teach in a quarter system in case those of you are teaching in a shorter turn. I don't necessarily hit all these regression models at the end of the quarter. So this unit is a pick and choose unit for those of you who are planning to teach, use this book in the long term, that you can actually pick and choose certain chapters. And lastly, we talk about hierarchical models. We present hierarchical models from both normal and non-normal perspective. In addition to statistical content, I would like to note the pedagogical approach that we have taken when writing this book. I will start with an example of how we check intuition and why that is important. So we are actually talking about whether an article is fake news or not. We have this prior information that 40 percent of the articles are fake. And then we observe data, we know the data that exclamation marks are more common among fake news. And we have seen an article with an exclamation mark. So we have to decide is the article fake or not. So based on this information, students actually already have some idea about the posterior. Whether we teach them what posterior model is or not, they start with this idea. So checking intuition at the beginning is very important because it gives the instructor ideas about what the students hold. So we try to put these intuition checks as we go along in the book. Another approach we took in the book is active learning principles. And this is a little bit easier in the classroom than in a book because in the classroom we can see if the students are actively engaged or not. But when the readers are reading the book, we don't have a way of checking that. However, if the readers choose to be actively engaged, they have the resources to do so. One way is we have put plenty of quizzes. And in fact, I put one example quiz for you all to try if you want to at the end, which is a quiz testing whether you're a Bayesian or frequentist at the beginning. And another way we actually try to perpetuate active learning is by providing hands-on programming code directly into the book. Most statistics books don't necessarily have the code in them. They would come as a supplement or a lab, but we thought this was very crucial and had to be in text as the readers were reading the book. I also have the Metropolis Hastings algorithm as an example there. Throughout the book, we try to ensure that computing and math were together and not separated. And often computing comes first so that the mathematical derivations that follow makes much more sense and are easier to adapt to. In addition, we use computing in detailed form from scratch every time we use for a single case and try to explain that in depth. So once students understand how things work the way they do, then we will want to use built-in functions. When we started writing this book, not only we wanted everyone to be able to learn Bayesian statistics reading this book if they have the prerequisites, but we also wanted to make sure that this book is accessible and inclusive as much as possible. Unfortunately, we are not trained in this topic at all, so we had to do some work ourselves and study on this topic. Our first decision was to make this book be open access and it is and it will always be. Our second learning has been on visual impairments. We've learned about colorblind friends of poets. We've also learned about how to write alternate texts and so on. However, we had one issue even though we wanted to write alternate texts so that the book would be accessible to blind people. We could not because we wrote the book using big bookdom and at the time NITR package did not support alternate texts for images, but I'm happy to tell you that because of this book and the fact that we realized that at the time, we had made that request and NITR now supports alternate texts. I'm so thankful for our studio taking this request so seriously. Another homework that we had to do in this end was we wanted to make sure we don't own aside the scholars that we are already aware of, but we wanted to go above and beyond in reading other scholars and that we have not necessarily covered in our coursework in the past and make sure we have a diverse body of scholars cited in this book. Last but not least, we also wanted to have a diverse set of data sets and we do provide these data sets also as part of the base rules package. So since this is an art conference, it makes much more sense for me to actually talk about the art packages a little bit more, how we use this in the classroom. So base rules package currently is not on cram. You can currently install it from GitHub. So we've seen functions that help with the beta binomial model that helped plot the prior. We've seen the prior likelihood and posterior plots. The way these plots actually support learning is students, this gives the learners an opportunity to compare what happens when there are different priors, what happens when there's different data. So through this, the students build the understanding of prior likelihood and posterior quite well. Beta binomial model is not the only one that is supported and plotting functions are not the only ones either. So for beta binomial model, for instance, we have the plotting functions as far as the summaries that give the mean, median and the variance of these models. We have the same set of functions for gamma poisson as well as normal-normal. We also have model evaluation functions that support more the regression models that I have previously mentioned. For each kind of regression model, we have set of two, we have two functions, the function summary function, whether it's prediction classification or net classification. And then we have is crossfold validation form with the CE. So let's look at an example for a given model and data. This prediction summary, for instance, would give us some model evaluation criteria such as median absolute error or the scaled or how many of the observed values fallen with the predictive credible intervals. And we can do the same with the crossfold validation by providing number of folds. I just give a two-fold example here, but you can see that for each fold it returns the model evaluation criteria as well as the overall crossfold validated results. Last but not least, I'll share a few resources for those who are teaching Bayesian statistics or who might be teaching Bayesian statistics. So we have at the undergraduate level a website where we list the resources that we know about textbooks, papers on the topic, and so on. And then we also have a Bayesian education network. If you would like to join, please read the instructions there. You're welcome to join. And I also have my course website stats 115 that I teach quite often, and I do use the Bayesian rules book. So if you want to take a look at slides, you're welcome to do so. This is all from me today. If you have questions, feel free to ask, please. So it was a wonderful talk, all your passion for teaching. And just nice to see a blog. I invite everyone to check out her blog, which includes things from video editing to, of course, teaching Bayesian. Just a quick question. So I think there are so many resources that you may have available. Are you also planning to put up some of your videos that can go hand in hand with some of these small modules that are out there? Rowan, before I answer that question, I'll clarify two things from talk really quickly. First of all, thank you to attend this forthcoming. I think in general conference, it would have been good to see you, but I see some familiar faces. So hello to those. And also, Bayes rules package is now on cram. So you know how things move so fast in the world. When I recorded the video, it was not on cram. So now it can be downloaded from cram directly. And the second thing I think I did not mention this very well in the talk is we are actually book has alternate text because our studio upgraded the NITR package. We do support alternate text. So the book should be accessible to visually impaired readers. The problem is this is like a process that we're learning about alternate text and we keep improving. So the early chapters are better, but we keep improving this text as we learn more about how to write better alternate text. So it should now be accessible and getting back to your question, the videos. So I actually have videos for my own students and they're only accessible through the university system. And I did think about moving these to YouTube, but because I don't actually want to host something on YouTube, as per se, I did not necessarily do that. But in the future, if I change that decision, the videos are there, I might release them and possibly one of the co-authors might decide that but currently we don't have such a project in the book. I think somebody had asked for the link to the book. I already posted it to them. You can look up the book at baserulesbook.com. And the book is open access and with a lot of material. So please do explore it. Any other questions for me? I think it was a great talk. And I think people got a good picture of what the book is and how they can use it in teaching. So what is the longest course that you think? So there are many people who teach in a sinister system. So I think this can probably be used for even like a basic course or an advanced course across two centers, I imagine. Yeah, actually even so the book has 19 chapters. So it's like one could technically adopt it for a very long course. But also like I don't teach that long side, pick and choose certain chapters. And some people teach more mathematically oriented courses. Some teachers, some people teach more competitive oriented. So for instance, Metropolitan Sastings chapter is very competing. So if somebody is teaching math oriented, they may want to skip that and so on. Also, I can see some people like they might be teaching pro of the course, but if there's any time to switch the statistics other than pro of the and move on the likelihood direction or math stat courses, they could definitely take portions of the course portions of the book, not necessarily the whole book, but it is possible. Perfect. Thank you. I request you to stay back to the end of the session where we can have a little bit of a discussion about some of these things as well. Thank you so much for a very nice talk and a really nice book. Thank you. So our next speaker is also a mean mean Chetan Kaya Randall. And I'm very happy to introduce her. She's a senior lecturer. She was a senior lecturer in statistics and data science in the School of Maths at the University of Edinburgh. And she's moved all the way across now to Duke University, where she's a professor and also a data scientist and a professional educator at our studio. So really thanks for joining us. I have to be a long move across and great. So this is a live talk. So it was some live action after a few video talks. So may at least share your screen and get going. Thank you very much and thank you for having me. Let me go ahead and share my screen. All right. Let's get started. So today I'm going to talk about a project that I've been working on for more than 10 years at this point, which is hard to believe. It's a project we started when I was in graduate school. The project is called Open Intro and the talk is titled Building and Maintaining Open Intro Using the Art Ecosystem. So I want to give a little bit of a background on the project first. Open Intro's mission is to make educational products that are free, transparent, and lower barriers to education. As evidenced by the name Open, all of our products are openly licensed. And I'll talk about what I mean by products. We have a bunch of things that we produce as part of this project. If you're interested in finding out more about the project and what we produce, the best place to go is openintro.org. And for those of you who are user fans who have been around for a while, my collaborator, David Diaz, who is one of the founders of Open Intro, gave a keynote at User 2014 about the project. And I just looked through his slides from back then and reflected on how much has changed. And also we've accomplished over the years, and that felt really, really nice. So before I go further, I'm not going to link each project to an individual person in the interest of time, but you can see there are lots and lots of people who have contributed and volunteered their time as part of Open Intro. So I want to thank all of them and acknowledge all of their contributions so far. I'm going to start talking about some of our books. So this is how it started with one textbook, a preliminary edition. I remember writing that book during my final year of graduate school trying to avoid writing my thesis. And now we have four textbooks in statistics that are out. And really, if we look at this timeline, Open Intro was founded in 2009. And that's when we published our first book, the Open Intro Statistics book. That's probably the most commonly used one by educators and learners. So you might be familiar with it. There's been many versions of that throughout the years. We also have a high school level book that we've collaborated with a high school teacher, Leah Dorisio, on that. And we also have a more kind of simulation-based computational statistics book. That's Introduction to Modern Statistics is the newest one on that. And I've worked with Joe Hardin on that book. And we also have another book on biostatistics where colleagues David Harrington and Julie Vu have actually taken one of our books and written a biostat version of it. And that's really nice to see because they've come into this project kind of specifically to do that. And that's what the license for the book allows. I'm going to be talking about some of the challenges and how we solve them with R. But I'm going to be focusing primarily on Introduction to Modern Statistics, since that book just came out. And I've been like living and breathing it over the last year and a half. But to give you an overall sense of our pedagogy, we use applications as motivation. We use real recent and relatable data sets. There's lots of emphasis on data exploration, multivariable relationships and statistical reasoning. There's lots of guided practices and worked examples throughout the book and lots and lots of end-of-chapter exercises. And we provide solutions to some of them at the end of the book as well. The license is a Creative Commons license and its attribution share alike. And so you are welcome to take the content in full or parts of it. And as long as you provide attribution, you can make modifications and make your own version of it. And I always get asked about kind of our business model. Our HTML textbook is freely available and will always be. PDF is also freely available. But we've been distributing it through a lean pub with a suggested donation, but you can scroll that all the way down to zero. And we do also do paperback that's printed on demand. And that is the cost of printing plus a minimal royalty that basically goes back to open intro, the nonprofit that we use to send books out for free to educators, as well as trying to kind of get off of Amazon if we can do that. So how do we write the book? How it started is it used to be written all in latex, each chapter in its own latex file. And then we had these folders for each of our figures so that they could be reproduced. The book has always been reproducible, but to reproduce it, you kind of had to know your way around the folders. And now we're this most recent book, we've written it all in bookdown. So each chapter is its own, our markdown file. And you can output to either HTML and PDF with the same source code plus lots and lots of customization that I'll talk about. So this conversion from latex to our markdown was actually not so trivial. So how did we approach this? So turns out there's actually a lot of ground you can cover just with a little bit of Pandoc magic. So I'll kind of highlight one thing here. I'm basically running a system call that's a Pandoc call that takes a tech file and writes it out to an MD file, so a markdown file. And then we turn those into our markdown documents. But you can do this sort of conversion pretty quickly, although what you get is pretty raw. And then since I said we had lots and lots of latex files, write a function that does this conversion and then iterates that over all the files that you need to convert. So you get things that look like our markdown files, but that are kind of crooked in many ways that need to be post-processed. For post-processing, we treated these files as text documents, basically lots of string gar functions, lots and lots of regular expressions. If you don't know what you're doing, add a few more back slashes and keep going until things work. The magic package is great for converting images from what used to be PDFs to PNGs. And also the FS package has been really helpful for figuring out what files do I have, which ones am I using, and which ones am I going to get rid of. So using all our packages, we were able to basically do this conversion and then obviously lots of manual editing after that. Another challenge was achieving similar looking custom blocks in HTML and the PDF output, like those guided practices I showed earlier. So the solution turns out is using these fence div blocks. And also there's a really nice kind of write up about this in the our markdown cookbook that I link to here. So the idea is that you write one of these blocks in your our markdown document, and then the empty space is where basically your guided practice, for example, would go. And then you do two things. You write your CSS for it, for the HTML output, and then you figure out how do I achieve a very similar look with latex and you write some custom latex. I had to do a lot of learning for writing custom latex functions, which I hadn't done much of before. And I kept remembering this thing I had read from UA in I think perhaps in the book down book or some other documentation that said, if you really want to achieve latex output to a T just use latex at that point. So you have to let go of certain things, but you can get pretty close to a very similar look. Another one was plots. So the challenge was a consistent and branded look for our plots. And the solution turns out is to define a custom theme, which you can find lots of information about on the web, but also redefining some of the GG plot to default. So when you say want points on your plot or a histogram or a box spot, they actually are plotted with a predefined custom color. So we have our own custom palette for the book. So we were able to redefine these GG plot to defaults. This is something I came about to kind of halfway of writing the book and turned out to be a really neat trick. Another one is exercises. When you're writing a textbook, exercises at the end of the chapter are treated differently by different professors. So we wanted to make sure that the source code for what's in the printed book is publicly available. But we wanted to also make sure that the source code for full solutions not to be publicly available because some people actually grade these solutions. So we don't want basically a solution key out there. And we also wanted to make sure that the question it's full solution and a short answer version of that, that could go in the back of the book all live in the same repo. So we know where things are. So it turns out the solution for this might be is that we hide away our exercises in a separate private repository that's available only to the authors, where we have a folder per exercise. So there's like 300 something folders in there. Each of them contains in our markdown document one for a question, one for a short answer and one for a full solution. And then what you do is you programmatically generate a single or markdown document that grabs the exercise text for each of the chapters and creates that copy that over to your book repo. And you also programmatically generate a single or markdown document for short solutions of odd numbered exercises. So we were able to do this basically using, you know, dplyr pipeline. So what I want to show here is that all of these seem like infrastructure problems that we were able to solve with what we know about doing data analysis with our we also offer labs to go along with the book. So this is one of our lab repositories, for example. And one of the challenges that we run into is maintenance. Lots of folks graciously offer pull requests and typo fixes to these, but constantly kind of merging them and also making sure that all of our markdown documents are remitted can be a bit of a pain. So turns out you can write a GitHub action that will rerun your our markdown document and commit the resulting HTML to your repository with each push or when a PR is marked. So now I'm able to kind of even if I'm on my phone, merge somebody's pull request if they have fixed the typo and know that our hosted version of our labs have the final kind of corrected version. Another one is tutorials. So our challenge was to keep the main tech software agnostic. We truly believe that you can't do modern statistics without computing, but we also know our audience for this book is that some of them are using with not as much computing integrated as others and others might be using different languages. So we decided to supplement each of the kind of parts of the book with interactive art tutorials so that they also have the pros as well as the code that are developed with LearnArt. They are linked directly from the book so integrated into it, but also you can use them kind of independently on their own as well. Another product we have is packages to support the book and the open intro package. It's basically the main package that used to contain all of the data sets that support all of the open intro textbooks, but turns out one challenge is that when you have data sets in a challenge, it can really blow up your package size and we want to make sure that this package can be hosted on CRAN for easy installing. And so our solution was to actually break it into smaller packages and not haphazardly in a way where we think that larger data sets and those that might grow can basically split off into their own packages, but that can be not backwards compatible moves. So we did it in a way where the main open intro package depends on these many packages. So when a user installs the open intro package, they get it all. So in a way, this might feel like we're doing more than what the user asked for, but particularly because this is used for educational purposes at the introductory level, we want the experience for the learners to be as seamless as possible. All they know is a single package called open intro, but we do have these small offshoots that allow the package to grow in time. So if all of this sounds interesting to you and you're interested in contributing, we have a few ways of doing so. First of all, we have a GitHub organization for open intro called open intro stat and you can browse through all of our repositories and open an issue or make a pull request. If you're not a GitHub person or if you would like your students to contribute to it, who may or may not be as familiar with GitHub, each of our books also comes with a kind of a landing page on openintro.org where we link to a form that's hosted on our webpage to send feedback or report typos. And we actually have used, we've kind of seen that a lot of professors will give extra credit to their students for reporting typos for one of the new books as a way of giving back to the project. And those are super helpful for us and we periodically integrate them into the text itself. If you are interested in being involved more, we have a get involved page with the URL on top there. You could be a data hunter, you could be an exam contributor, you could write exercises or labs. One of the ways others have developed and really kind of owned the project is they will write a different software version of a lab. So say in Python or something like that or any of your ideas. So thank you so much for listening. And if you have any questions, I'd be happy to answer. And the link to the slides is on this final slide, if you'd like to go through and grab any of the other links I've shown along the way. Thanks Meenay. Thanks for sticking to time perfectly. I think we've had a very good session with four speakers on very different topics. And so any questions for Meenay? I hope you've had a chance to look at the book. I've posted the links on the chat. So please do take a look at the book. It's really interesting. So can you tell us a little bit more about the main challenges that you've been facing while teaching a course out of this book during the pandemic? So I presume you had to do online classes and so on. So how important are labs to you in teaching these kinds of topics? And I think doing online classes is one challenge, but doing online laboratories is another complete challenge. So any thoughts on that? So yeah, that's absolutely correct. I mean, teaching computing in the way I feel like I've done it prior to the pandemic has been very much about being in the same room with the students, being able to look over their shoulder and help them with things that they're struggling with. So it has made us kind of rethink a lot how we can provide feedback to students. So that was one of the reasons why we kind of I started using these learn art tutorials a lot. And that was also one of the motivations for including them in the book as such, since we don't extensively go into kind of like pros about how to do things and are in the main textbook itself. We didn't want to just provide code in like a really short format, but we wanted to be as kind of verbose as needed when teaching the art code as well. So those tutorials that I link to in the book, for example, they're like 32 of them and each of them probably take you about half an hour or so to get through. So there's a lot there not to overwhelm the students, but just acknowledging that that's probably what it takes for somebody to learn the code. So I think that's been one of the takeaways from this pandemic is that you can't just throw coded students, especially if you're not in the same room with them, you have to do a little bit of extra work yourself to deliver that same content in a way where they can do so asynchronously on their own. Okay, great. So any other questions for either me or any of the speakers? Maybe I had the request all the speakers to turn their videos back on. So it was very nice. I think it was a great session that we had. So we have, I think, another nine minutes before we end the session, but we could end in like six or seven. That's fine. So any point or say any discussion, anything that any comments on the other talks that you listen to and so on from any of the speakers? So maybe we could quickly start off with that one question that we left behind. So this was for Allison, I guess. And Julian, there was a question on have you considered working with our data carpentry, right? And I think Zian represents carpentry. So what are your thoughts, Allison? Yeah, I think that's a great idea. And yeah, I love the carpentries. And it sounded like from Zian's talk, there's an opportunity and also from Mina's talk, if people are looking for data sets. And I think all of us being teachers, the value of good data sets, it can't be overstated. And when Julian and I were preparing this talk, I at one point had taken screenshots of my downloads folder, which is just like files of like random data sets I've downloaded from all different places trying to find one that like really suits my needs. And so yeah, I think there is an opportunity to work with the carpentries and also others teaching data science and creating modules and lessons that we hope some of these LTR environmental data sets will be valuable for. And I think there are a couple of questions. So can you share how you choose the license for sharing your work? It's a question that any of you could answer. So maybe starting with the last speaker. Yeah, I'd be happy to say something about that. So when we first started open intro, I have to say I was very new to anything open source, like let alone licenses. And I felt like we're putting all of this effort into this. We have to make sure that it can only be used for non-commercial purposes, for example. Because what if people scoop us? What if they take our work and kind of republish it? And I think that threat is still there. But you know, then having a kind of a better understanding of how licenses work, when you say something can only be used like non-commercial, that might even mean it may not be used for corporate training, for example, like internal training within an organization. So there's a lot of facets to it, which is to say, and it takes education of oneself and also really just kind of having a close look at your personal goals with a project and say, what are my fears about what people can do with it? But then also, who am I trying to reach out to? And will I be able to reach out to them? Another concern I had, for example, is that I teach this course on Coursera and we use one of these books. Coursera is not a nonprofit organization like many of the universities are, for example. But it reaches a lot of people. So there's a lot of facets to this decision. So ultimately, we decided to go with something permissive and not with the hope that we would ever sue anyone. That's not a path we would ever go down, but more that trying to establish ourselves so that no one is thinking that they can just get away with taking the work and not attributing it. And the open source community is really nice. And I think people are getting more and more educated about this stuff. But that's how our thinking kind of evolved over the years. So I think there are a couple more questions. So if somebody has a comment on this, we could come back at the end. So Alison, curious how you define good data sets, especially for teaching, because the real time data sets are really, really messy. Yeah, that's a great question. I would love to hear how all of the different teachers in this session react to it. For me, a good data set is one that serves the purpose that I need for my teaching in that lesson. And that's something completely different for the different things that you're teaching. Like in some lessons, what I want is like the messiest, most like poorly documented monster data set that I can find. Like sometimes that's the purpose. But in other times when the focus isn't on the messiness, when the focus is on understanding maybe a method that we're trying to use, then maybe we want to start with a cleaner data set so that the complexity of the method doesn't kind of blend with the complexity of the data set. So sometimes I find that when students are learning to try to compartmentalize the messiness of dealing with big, complex, messy data sets, if you're trying to simultaneously teach a complex concept or skill, then sometimes those blur and all that scene is like, this is hard and I hate it. So breaking it down so that maybe when we're teaching at least to start more complex topics, maybe let's start with a already tidied up data set. And that is perfect for that lesson. And then we can build in messier data as we move forward. But I think the answer for me is it completely depends on what the goal is of what I'm trying to teach. And that really varies widely. But I would love to hear what the rest of the presenters have to say about that. I think anyways, almost out of time. And I'm sure you know, you can continue to interact with the speakers via social media, Twitter or whatever. So good luck to all the speakers here with the rest of your teaching or have a nice summer break before you recharge for the next session. I think with that I'll hand it back to the organizers of the conference. So is that a slide you have to show right now, Jyoti? I'd like to once again thank our sponsors, Raj for sponsoring the entire conference and Absalon for today's session. And I think next up we have a 15 minutes break followed by some really, some more interesting sessions and I hope you're all having fun at the conference. Lots of thank you messages on the great talks. I'd like to really again thank all the speakers for very interesting talks, very interesting topics, very interesting books. So I think I really invite all the participants to look closer at the work of all these speakers. Lots of interesting stuff. They have blogs, they've written articles on software development, teaching, video editing and so on. So I think there's a lot to learn. So great. It was wonderful chatting the session. Thank you all.