 All right, thanks so much for that introduction. Like you said, I'm Vicky Steves, the librarian for research data management and reproducibility at New York University. I'm joined by Sarah Nguyen, who is a research scientist also at NYU. Our third team member on this project, Genevieve Milliken is in the chat, so be sure to give a nice hello there. So to start us off, we're gonna be talking about Ice Age. And yes, the acronym doesn't manage the pronunciation, but that's just the way it goes. Ice Age is an Alfred P. Sloan funded project that looks at basically two streams of work to try and help with saving all of y'all's work that is in the get data format on get hosting platforms. So those two streams of work include a behavioral study around how folks in academia are using get and get hosting platforms, which Sarah will talk about way more in depth. This second stream of work focuses on evaluating how those materials are currently being archived and how we can assist in aid and where the gaps are and where we can move forward as a community. So the goal of this project in a nutshell is to make sure the wonderful work that you all are doing, producing great get repos and the context that adds to its meetings, such as issue threads, PR discussions, wikis can go from this really highly active, collaborative state to a state where they are stable, citable and under professional preservation. So here are the three of us pictured, but I also wanted to just mention this work would not be possible without our amazing colleagues in NYU Libraries, Digital Library Technology Services, which is our digital preservation unit. And before I get too ahead of myself, most of y'all at CSV are probably familiar, but before we go into some more depth, I want to make sure everyone has the same baseline understanding of what Git is. Git is basically a revision control system. It's a command line tool. The point of Git and other revision control systems is to be able to compare, restore and merge changes to our plain text materials like code. So using Git facilitates collaboration, transparency, obviously something CSV and us care a lot about, which is awesome. So we love Git. There are also these things that we're calling Git hosting platforms. They're literally platforms on the web where people either host a copy or host a copy of their Git repositories. They add some features on top that enable wider spread collaboration as well. So the most popular is GitHub followed by GitLab, BitBugget and Sourceforge. So we're focusing on those, but as you can see from this very small screenshot from Wikipedia, there are a lot more. I especially want to give a shout out to SourceHut, which has this really cool sort of non, this cooperative model for hosting platform that looks really interesting. So check out SourceHut. And since this project has scholarly in the title and we're gonna be saying things like scholarly ephemera to denote parts of a repository we want to save, I thought we should show you some concrete examples of what exactly we mean by scholarly Git usage. So there are some of the perhaps more obvious ones like using GitHub to publish data and code as supplementary materials to a paper. But there are also some really interesting uses. How yay, I was in the Yenny group if you caught his presentation yesterday and I gave a shout out to this bio archive preprint, it's my favorite one. It details really in depth how they went from a manual quality assurance procedure for their lab for their data to an automated one using GitHub and Travis. So using this Git hosting platform, which is meant to do software engineering to do automatic QAQC on data is a really awesome scholarly usage. But even there are whole journals that run on top of GitHub's infrastructure such as the Journal of Open Source Software, JOS and then the Journal of Open Educational, it's Jose. And they conduct peer review and the entire publishing workflow on GitHub. So these are some of what we mean when we say scholarly Git usage. And the extent to which this is present in Git hosting platforms is pretty big. Sarah is gonna talk a little bit about some of this scoping in depth but I really love this background piece that comes from Hasselbring et al. I highly recommend y'all read the preprint. They basically identified 5,000 repositories that host research software specifically. They did that by looking at GitHub itself but then also citations of repositories in ACM's digital library and archive. They found the most were in life science, the next were in general science and then the third were unpublished or archives which is like backups. So these are some really interesting scoping of the extent that this material is present. So then the problem we wanna solve is there's all this really interesting tooling going on around using Git. So many different platforms. This source code by itself is really valuable but it's also contextualized by the scholarly ephemera around it like issue discussions. And no current project captures both the source code and that ephemera. So we want to be able to get everything together. My emoji and my elephant in the room which didn't actually show up my dad is GitHub's archiving program which you may or may not have heard about. Not gonna say a lot about it because David Rosenthal's blog post there that I linked at the bottom says everything you need to say but I will also just highlight that none of these projects are really solving the problem that we wanna solve in that keeping the code together with this scholarly ephemera to everyone's benefit. So I'm gonna pass it off to Sarah who's gonna talk about the gap analysis. Hi everyone, this is Sarah. Can you hear me okay? Okay. So as he said, my name is Sarah Neane and I am the research scientist focusing on Git on the behavioral side of Git and Git hosting platforms. Next slide please. So throughout my research, I've basically phased it out into five different parts starting with a literature review of just anyone who has mentioned Git, GitHub, GitLab, Bitbucket in their papers or in their studies. And then we go into systematic review, a focus group, broad survey and user experience interviews. I'm going to be focusing most of today's talk on our broad survey as that's something that we've had some preliminary data to share but if we go to the next slide, I can share with you basically the research questions that we're focusing on as we do the, when we started off with the literature review which is basically just seeing how scholars use Git and Git hosting platforms as a toolkit to carry out their specific use cases which is pretty similar to programmers and anyone else that uses Git but specifically we're trying to just make sure that we can serve as librarians and archivists, those people who are doing things that are research and education based. And then we also in the end want to be able to teach people who have experienced Git but have not yet been able to fully incorporate it into their workflow. How do we better introduce this topic to them and how do we better make it so they can bake it into their workflow process? So feel free to check out our blog that we've been updating as we just go through different types of papers and just a summary of all of those. Next slide, please. So just listening to yesterday's talk, it seems pretty, it was awesome just to see other people who are also just studying how people can use Git and make science and just research and information more open. So it's very similar to how many developers will already use it but we did pinpoint these Git experiences specifically how scholars use them through the literature review. And obviously that's version control because that is what Git is based off of but community and collaboration was a huge one just like as Git hosting platforms are social coding platforms. Method tracking, we see those through lab notebooks, education, Git, all of those different hosting platforms have a classroom setting. Data processing, reproducibility, publishing, publishing is a huge one. So these are all specific buckets into what we call scholarly use cases. And if we go to the next slide, this is just a paper that we want to highlight because it recently came out just this past week and it was written by a graduate student on just how they overcame barriers using open source software and they also referenced Git and Git hosting platforms as a main point of use as they used open source software and collaborated with their peers. So I highly recommend you check that out. So if you go to the next slide, I'm gonna give you some examples of just very relevant use cases of how scholars and researchers are using Git. So many of you might know that Johns Hopkins University Center for Systems and Science and Engineering came out with this awesome dashboard based off of Esri just showing the spread of COVID-19. So they are actually posting their data on GitHub and their visuals are being fed to on Esri. So they're connecting the two platforms together. And you can see there's about more than 1200 issues but those are not being preserved in a specific way. Yes, Git is a decentralized way where each person can take the source code but these types of conversations that are happening on issues and pull requests are what we're interested in because that kind of just plays into the scholarly workflow of peer review and collaboration. So that's just one repository. Next slide, please. This next GitHub repository is another one that I like to highlight because it is also very relevant to COVID-19. Some of you might have seen some discussion happening on the interwebs last week about how this code base that was basically saying that this type of modeling could show and reveal some information on how to reduce COVID-19 mortality. And it was basically taken down and also retracted even though it was just a preprint of a report but in this issue, which is closed, there was a long discussion of how they're testing and there was many bugs which no one is a perfect coder but it's just so important to be able to see this type of discussion happening throughout these types of data and modeling and scripts and code, which isn't currently being saved in many of the platforms that we're looking at. If you go to the next slide. So after the literature review of just looking at how people are using Git through conversing through issues, I'm interested in going through a systematic review, basically seeing how all published papers, how have they been referencing URLs and to Git hosting platforms and Git within their workflow? So this would just be a large harvest of articles and metadata. I'm not gonna go too much into that because this phase is still a work in progress. Next slide, please. So phase two of the research is a focus group which is also in progress right now. We've been interviewing small groups of three to six researchers, students, anyone who is what we call a minimal user. So someone who has been exposed to understanding or just has seen the world of Git but has not yet been able to incorporate it into their workflow. And we're mostly interested in this because a lot of people will be found are enthusiastic about version control systems and Git being a very popular one for exposure is something that they get excited about once they're in the classroom, but they tend not to bring it over into their daily workflow. So within these focus groups, we wanna understand what is that bottleneck? What is that threshold where they just cannot enter that door, open up that door to include Git into their workflow? So we're excited to see how that is incorporated. One tweet that we'd like to highlight is from Christy Whitaker, Kirstie Whitaker from the Allen Turing Institute. And she was basically talking about how her experience of teaching someone a pull request which is kind of a foundation to using Git and sharing code and source code with each other. But if you can't even get over this one barrier of a pull request, how are you supposed to even incorporate it into your larger workflow with all of the other different types of commands? So this is something that we're interested in of just making tools and a toolkit for people to understand. And then part three of the research is a survey which we hope you all can participate in and share with your colleagues. This, our target population are any scholars and users who use Git across all discipline and statuses whether you're a minimal user or an advanced user. And we just wanna get into a wide range and comparable senses of just how you were introduced to Git, how you learned it, who taught you, do you teach it? What's your daily use behavior? The survey takes less than 10 minutes. So I hope you're able to participate and so we can get your feedback. Five minutes. Five minutes. I'm just slow at clicking multiple choice buttons. So some preliminary findings within this survey. Again, Mr. Kosh's cone is telling you please participate and share widely if you have time. One of the first questions that we do ask is when did you first start using a version control system? And this is just a nice time series. Obviously this is a very small population right now as we just, I was the only recently put out the survey but I'm excited to see any sort of trends of the year that someone has picked up Git or a version control system and what major events have happened around that. We can see a spike happened in 2005 when Git was born. I can't say that's clearly correlated but that is something that is interesting to me. It was also interesting just to see that in 1990 we haven't had anyone who has adopted a version control system immediately after 1994 or earlier. And then it'll be interesting to see when funding agencies start mandating data and source code be made openly available, will there be an actual spike in there? A few years afterwards we do see this in 2015 which I haven't been able to find a notable event timeline for that year but if anyone has experienced anything or remembers from their memory, it'd be great to hear that. Next slide please. Another preliminary finding is we do ask people what version control system did you start out using and what version control system do you use now? Because as you, we all know we're all interested and curious about learning about different technologies. So just because you started it with Git doesn't actually mean that you'll stick with it if you learn Mercurial or SVN which has been pretty rare to be honest, obviously just because many places have been not supporting other types of version control systems but it'll be interesting to see. In the other category, other people had, participants had submitted that they use Microsoft SourceSafe, Perforce, Fossil, Darks. So there's a lot of different version control systems that people actually do still use. Next slide please. So why did you first enter the world of Git in version control? I like this question just because it's kind of like your motivation or intention to version control. Obvious factors that we just need a version control system. In our focus group, something interesting is all of our focus group attendees who are minimal users, they had stated that they heard that it would get them a job in the future which for right now, we see that not very many people have responded to that. Other types of reasons on why people have entered the world of Git in version control was that their workplace was requiring the use of it or it was required for a web development course that they took. Someone said that they needed to back up their code which is really interesting to us just because we are looking at the preservation aspect of it for long-term use. Others wanted to replicate the code on multiple computers. Better documentation for complex projects which is also something that we're really excited about just because we do see this type of version control system as a way so that we can have code and open science. Someone also said that it was superior to SVN which I find is funny. So this next one is... Sarah, maybe for the sake of time we'll run through this a little quicker so I can do the final overview. So let's just skip ahead to part... So basically we have quite a few preliminary findings but if we skip ahead to part four which is basically the closing of understanding the behaviors behind Git is we're going to be doing user experience interviews. So we hope to be able to take some more one-on-one time with participants to understand how you actually use Git. And now I'll pass it to Vicky so we can wrap it up. And the last question of the survey does ask if you wanna participate in this so we hope you will. I'm just gonna really briefly go over part of Genevieve's project, sorry to the project. So obviously she's available to answer questions. This is her beautiful bunny Oshi. And she did basically a huge environmental scan in parallel to the work Sarah's been doing looking at four key areas of the ways that people are capturing and preserving source code and ephemera. So I'll go through them briefly. So web archiving which is the use of crawlers or similar software to capture and preserve a look and feel of content on the web. So projects of interest I'll talk a little bit about later archive it web recorder and memento tracer. We're also interested in self archiving which hopefully a lot of you have done now that you're familiar with Zenodo. This is looking at the motivations of depositing core code into general like Zenodo subject like GenBank or institutional repositories. So what are the motivations for where people choose to put their codes, put their code what makes one platform more attractive than the other whereas the feature parody questions like that. We're also interested in the use of API's indexers to automatically capture this code. So obviously software heritage is the big one in the space where they have public copies of many public repos across multiple get hosting platforms including sunsetted platforms like Google code. They have all the open material on Google code. So they're capturing the source code and then there are projects like GH archive and GH torrent that are capturing the ephemera using GitHub's API. And then there's a really awesome project Sarah which allows it's a mix of programmatic capture and self deposit where basically you make a copy to an institution's GitLab and then some metadata is put into an institutional repository for greater discovery. So very cool. And then obviously the last piece is software preservation which is around sort of the best practices in curating and making accessible for the longterm source code but also compiled binaries in those ephemera. Some projects of interest include obviously software preservation network just a member based organization similar to the data curation network which is also member based. I'm really interested in drawing the line between spin and DCN especially because DCN produces really excellent primers on writing code for data curators. But also of interest are the UK and US research software sustainability institutes which forefront open reproducible sustainable practices for research software. All of them. We did some tool crafting. We're gonna do more. Here's all of our info. Talk to us. Thank you. Sorry. Because there will be available on Slack if you are more keen and would like to know more. Thank you so much for your talk. It was very insightful and you explained Git in a new way that I've never heard of before and now I get it. Thank you. Hey, good.