 I am Quinn Thomas, an associate professor at Virginia Tech and I'm excited to provide an introduction to the semester-long course that I teach in environmental informatics that uses publicly available data sets and modules to teach both data science skills and environmental science concepts. First of all, my course integrates materials from different existing modules, many supported by the National Science Foundation, and uses RStudio as a coding environment. A major thanks to these efforts. To provide context, the course is a junior level class in Virginia Tech's environmental informatics major. It is also part of our general educational requirement for advanced quantitative and computational thinking. Its prerequisites aren't specific, but require some prior experience in quantitative and computational thinking. It is a prerequisite for environmental informatics senior capstone course, but it includes students for multiple majors across campus. The class is designed as 275-minute periods per week over the 16-week semester and has between 15 to 25 students. I've taught it twice, including the last spring semester they required moving it online halfway through. The 10,000-foot view of the class is as four phases. First, the students get their feet wet with a straightforward but interesting module, followed by multiple weeks of training in data science with ecological applications. Then the students have multiple weeks of practice with real data, different kinds of data sets, and introduction to key new data science concepts that builds on the multi-week training. Finally, the students are given more independence in analyzing a data set where the process of analyzing isn't as laid out in the assignments. The course also teaches fundamental environmental science concepts, allowing it to cover multiple environmental grand challenges. These questions include, how does climate change alter lake temperature? How is global ice cover changing in lakes? Where are rivers in the U.S. exceeding legal nutrient levels? How are global temperatures changing and what is causing it? How much carbon dioxide does a forest remove from the atmosphere? How much carbon is stored in a forest ecosystem? The challenges I had when developing the class were combining environmental learning objectives with the data science skill objectives. To address this challenge, I linked existing standalone environmental data science modules together. These existing modules have been vetted and tested, ensuring quality. But as standalone modules, they do not make a coherent course that builds skills over the semester. Furthermore, the existing resources use a range of data science tools, including Excel, R, and Python. Building coherence in the computational framework and trajectory of skill learning is a unique contribution my course provides to the larger community of educators. The materials are all available on GitHub as different modules. I recommend visiting the page to see them. By hosting them on GitHub, you can see my updates as I edit and improve the course. Each of the modules addresses an environmental question, and all but module 7, module 7, are derived from excellent standalone modules. The goal of module 1, which is part of the MACR systems eddie project, is to get students working with R, using a lot of guidance, but allows them to quantitatively answer a question early in the class. Module 2, which is build up data carpentry, is the core training in R for data science, and it's over a seven class period unit. Modules 3 through 6 are the practice modules, and they're from project eddie and from the cubes hub, neon cubes hub. And module 8 is the more independent variable, and that's one that I have developed as part of this class. And it directly uses neon data and was also developed as part of a cubes workgroup. I'll now go into details of the few of the modules. So module 2 uses the data carpentry resources, which I point you to here. It's an excellent, excellent resource for teaching the basics of data science in R, using an ecological data set. The data science skills that this module helps build up are R basics, data types, comments, execution, importing data, saving modified data, cleaning and filtering data, working with continuous categorical and daytime data, mutating data, which is calculations based on columns, summarizing data, pivoting data from longer to wider formats, and plotting data. So focusing on modules 3 through 6, the general structure of these is to first introduce students to the environmental data science concepts, where I largely have just modified the instructor lesson PowerPoints from project eddie, or from the neon cubes. Second, I introduced data science concepts using live coding of the new functions and approach they'll need to apply. These all build on the data carpentry module 2. Finally, I introduced students to the assignment by walking them through the assignment and the expectations. After all the, all the introductions of student work work for one to two more class periods on the assignments and I applied help provide help as needed, and the students allowed to work together on these modules. As an example, we're going to look at module 3 that focuses on lake ice phenology from project eddie resources, and it's here's where it is located on the course GitHub. A description of the module can be found in the read me file that is in each module. The assignment can be found in the assignment directory, the presentation and code to teach a new data science skills in the data science skills directory. And the presentation for introducing the science concepts is in the science introduction directory. As I mentioned the modules built out the project eddie module of the same name. If you're interested in incorporating similar science concepts, but want to use Excel in your class, I recommend directly using the module on the project eddie website. The module is mostly focused on practicing the skills developed in the data carpentry model to but add skills and linear aggression, and in our markdown generation. So here's an example of the assignment. The assignments are structured as our markdown documents that students work through, and they generate a final HTML file that is shown here on the right that includes text code results and figures. I highly recommend our markdown which is what is shown on the left because the assignments include easy to read text. And clear parts called chunks where students write code. You can see that's where it says insert code. When the students generate the final HTML, a process called knitting it reruns all their code. This forces the students to generate reproducible assignments and creates an easy to read document for grading. Our markdown are similar documents that mix text describing the analysis and the analysis code are increasingly used in the data science community. Visualizations of the data and analyses are a key focus of the class and I grade each figure they produce using a rubric. This is an example of a figure that plots the day of year that ice is no longer covering a lake over time for six different lakes. Students learn that a seemingly complex figure is easy to produce an R. Once you learn the key functions. The next module uses water quality data from the USGS to examine whether measurements of nitrate in different rivers across the US exceed the EPA limit. Again, this is based on a project any module that converted into our and emphasize the environmental science questions that also build new data science skills on top of what was learned in the previous modules. In particular students learn how to import data sets that don't have a standard header format, use if statements and code, apply loops to scale up analysis, pay strings and numbers together and communicate with an API. Here's an example of the data they have to read into our I stress in the class that they cannot just open a file Excel and delete the lines that aren't data because that does not scale up scale well. They always have to work from the raw data, even if it looks ugly and hard to work with initially. Finally, the course includes two exams. The exams use a real data set the students have not seen and requires them to complete a simple analysis within the 75 minute class time period. The analysis is relatively easy, but requires baseline practice to finish in the time and simulates a live coding job interview. The easy but timed exam complements a harder but multi week modules. I'll wrap up with some lessons learned. First, for reasons I previously mentioned I recommend using our markdown for assignments. Second, I found our studio cloud to be a powerful tool. Our studio cloud is an online version of our studio hosted by the RStudio project. It allows me to create a common workspace for all students. And since it is hosted on our studio servers, all students have the same computational environment. The server comes the issue which is all too common of at least one student not being able to install or run our on their computer. It also allows me to log in and see a student's code so that I can provide feedback. Our studio cloud is going to start charging for use in larger classes. So the cost may have to be included in a course budget. Finally, my take home messages are one, you using real ie not pre processed data is an opportunity to teach students data science skills to get the data in an analysis ready form that allows them to address an environmental science question. Two, there are a lot of excellent and some less excellent modules for you to use in a class that teaches both environmental and data sciences. However, the challenge is integrating them into your class without them seeming disjunct. I recommend focusing on efforts to modify existing modules first to build coherence around your learning objectives, then work on building your own modules and give that back to the community. Thank you for listening to my talk and my contact information is below if you have questions. Again the materials for my class are on the GitHub link on the slide. And there's a 45 minutes seminar describing the class and more detail that's also available, and I've provided a link to the slide. And again, many thanks to the educational community that's provided such great resources that allow us to bring them together to build coherence and opportunities for our students to learn both data science and environmental science simultaneously.