 The kind of project we conceived initially was like, oh, can we bring artificial intelligence to material science? And the product will be very, something which will be very unique and it's gonna push the entire material science field in a very positive direction and it's gonna be like a big job. So the nature of research was like bringing two different sort of components of completely not so connected field into one specific project. We were trying to bring artificial intelligence to material science and we were mainly focusing on natural language processing which means we need a lot of text for analysis. This text has to come from somewhere and that was my first step towards reaching out to the library and then eventually to get up well out. So in the beginning, I wasn't sure that how can I actually access, for example, string and nature API or Elsevier API and all those things and what are the terms and licenses and copyrights issues over there. So I needed someone who knows how to deal with this publishing giant so to say and what are the issues when you try to access all these things. So that was in the beginning. Yeah, I realized as a material scientist that, oh, I don't have the adequate skillset or the know-how of how to actually proceed. And then this whole journey started like, oh, within Carnegie Mellon, what sort of people we can reach out to or people who are interested in these kind of problems and what are the resources or the tools are available at my disposal. I started teaching an R class in data analytics for CMU at the Heinz School sometime last year. And I had been looking to grow my data science and data analytics skills personally as well as being able to reach out to others and see if I could pitch in and help. So R is my specialty, but I also use Python programming quite a bit to know about SQL and just data analytics and statistics in general. So it was serendipity. I actually received an email from the CMU libraries notifying looking for volunteers for the data co-lab. And after reading through it, I decided to throw my hat in the ring. I started doing some scraping of these papers and published articles. At that point, we realized that we need more people in the team to do all sorts of analysis. And that's where we, me working with Huwajian who is from CMU library, we started reaching out to people. And I think at that point, Mike was interested in joining on this problem and we were like, oh, we will be glad to have you because we definitely need people. It's a pretty good fit actually because of the kinds of skills that I was looking at. Some of the early things we were doing was web scraping. And like Amit was saying, you know, text analysis and I have some background in that. And I also have some background in metals as well. So I knew a little bit about the key terms and things like that. So it really turned out to be a good cooperative opportunity on both sides, I think. That's really how I got involved in this project. Jin set us up to have some dialogue. I think all three of us met in the very early stages and then Amit and I met and it evolved over time. But one of the very early elements of this was web scraping. And I had been doing some web scraping with my R class that I was teaching. And I thought, well, this is relatively simple. We can go out to this webpage and pull down some of this data and start to do an analysis on it. And then as the project evolved and we learned more about these sites and about the proprietary nature of the sites and the fact that some of this content is blocked and it's not as simple to get to, the project definitely evolved over time. So initially we just started because of the need of this project. And then we realized, oh, this is expanding at a pace which is not gonna be like doable if we have just two or three people in the project. So we started expanding. As of now we have multiple students working in this project. We also collaborated with one of the professor from LTI which is Languages Technology Institute within Karni Melon. And she's an expert in natural language in general and has a background in material science, applying basically natural language to material science. I forgot to mention the name is Professor Emma Struble and she joined LTI very recently. So that is where we are right now. It's definitely evolved from the early stages and certainly moved from some of the requests at the beginning of the project to becoming a more sophisticated project with many more people contributing to it right now. Amit, if I'm not mistaken, it was basically me and you at first with Wajin and the data collab and from there it kind of expanded. Yeah, absolutely, yes. I think I found the, what is sort of like the culture within data collab and within CMU libraries and other places was very welcoming. And so I didn't find any trouble in the big link. Was it helpful in overcoming some initial hurdles? Yes, because I didn't knew much about natural language processing, how these Springer, Elsevier and other resources where we have the actual text, how they operate. So having that sort of background directly given or provided to us all this information from CMU library was very helpful. And it gave me, personally it gave me the confidence okay, that we can do this project because this was a project which started in a way because of COVID because we had some extra time and personally for me was like, can we do something interesting? And I was trying to build a problem around developing a database for material science. And I wasn't sure how to do this thing. And while when COVID started and things were at home, so we were like, oh, this is a competition project and we can do something about it. So it gave me confidence after having multiple interaction in the first month that, oh, this is doable. Within Carnegie Mellon, we have all the expertise and the resources to do or to materialize this problem. And I don't have to do this thing by myself alone. So I like very much the nature, how it evolved and became its own thing. Now it's not my project, it's basically a project which is shared by 10 other people. So I really enjoyed that whole journey. The early discussions that we had, they were moderated. Like Amit said, Ohanjin set some things up and we had a series of Zoom meetings just like this and talked over the project and the requirements and shared some emails. And I think there was very little friction to start the project and to understand. Again, it evolved over time to be sure but very early on introductions were made and the project was described as it was conceived at that time. And I think the collaboration went very, very smoothly. I did not find it awkward or difficult at all. I think the interesting thing about this project was that we didn't have any agenda in this project. It was from the beginning, we all, I think knew it somewhere. Maybe we were not expressing it explicitly there was like a known understanding that this is like sort of an exploration of this. Like the nature of the project is come fundamentally explore this specific intersection of these two fields. So the nature of the problem and how we are approaching it gave each everyone who was part of the team to bring their own way of actually handling this specific process and take on that, oh, I can look after into this specific task and tell you maybe in a week what's possible and what's not possible. From the beginning, we knew that this we are just exploring we might find something useful, we may not and there's a lot of space to explore. So given we have more and more people coming in we never have shortage of problems to handle and everyone has the choice to basically choose what sort of problem they wanna handle given the larger scope of the project. So I think that made the process very smooth. Even the undergraduate some of the students were so for more freshmen even they have the opportunity to say that, oh, I can, I wanna jump on to this specific part of the project and work something about this specific part. And so it was very smooth in that aspect from collaboration within the group members. And from the tools point of view I think we relied on GitHub which just provides a platform to share the code we are developing and then we use Google Drive to store large points. And I think we also used Open Science Framework which was informed by CMU Library and which was useful as well. So that's my day. Yeah, that's right. Actually, as Amit was talking I just realized that the two of us have actually never even met face to face in person but I feel like I know him because we've had so much dialogue with Zoom and phone calls and emails and things like that. So of course, as he was saying the nature of this project is very computer intensive. There's a lot of data to be stored. There's a lot of data to come through and the results are all very portable. So of course we used open source software like R and Python and Open Science Framework and Google Drive's email Skype. So yeah, many, many electronic technologies to share the progress of the team and the results of the findings and the outcome of some of these scripts that we've put together. But yeah, I think the technology platform platforms that we've been using to share data part of that with CMU, part of that on GitHub and independent of CMU but very much using Skype and email and some of these other features. It's been very smooth. The project evolved over time as well. So as we learned that we were going to store say for instance, a very large database OSF was volunteered in GitHub and things like that. So even at the start of the project, I don't think we knew what everything was going to look like. Nothing that we thought we might need should have been set up and then it took six months for us to get set up. As we needed it, the tools were available and became available. Values immense because I don't have to get three degrees like I have three degrees in material science in natural language processing to work on this project. So collaboration, the value is literally is hard to put into words is the values immense on that's on their diet. I think the challenges can be communication because these are very technical things sometimes. And if you don't know the basic vocabulary to talk to the other person who is complete from a completely different background that could be challenging. But in our case, that didn't happen much because I think I have used a lot of programming and the coding while I was in grad school. So I knew how to communicate with Mike and other people within the group. And even with when Emma joined from LTI, we have gained enough vocabulary to communicate what we are trying to do. The opportunity of course for collaboration is that others have skills perhaps that you don't have. Some of the barriers or obstacles that I found were surprising to me and I needed to reach out and talk with some people at the libraries. And it was very collaborative back and forth. And I was impressed with some of the coding abilities of some people that eventually joined the group and brought some Python skills and brought some additional libraries and things to use that I hadn't even thought of as well. So obviously having different perspectives and different skill sets, like Amit was saying one person can't possibly have all of those skill sets. So being able to share those skill sets around is definitely beneficial and useful. I agree, the communication and I would even say in a bit coordination of efforts is somewhat challenging. Sometimes you think perhaps someone's going to do something and they don't or there are people out there that are duplicating efforts and there are two people working on the same thing when it might be better for one person to focus on something and another person to focus on something else. I would say that in my opinion on this project, we really mitigated a lot of that because I think each of our skills were very somewhat specialized. Like I was looking at web scraping and some of the filtering and we had an assistant that was also working on some Python code and everybody had their own little piece of the puzzle to work on. I can think of one thing that I found or a number of things that I found satisfying about this that I think it's like you were saying, a big picture is that this project really started out as a thought, I want to go out and I want to do this and scrape these web pages and look at some things and it was basically two people, three people with the co-lab involved, coordinating and cooperating and then this really mushroomed and expanded into the ability to gather some other students in the field and students that had these capabilities and just to see the project go from a thought, I guess almost like a proof of concept to like a full blown, hey, this could work and these are the tools that we need and this is how we're gonna make it happen. To me, that was the maybe turning point of the whole thing and to see this really get off the ground and not just fizzle, I mean, a lot of projects maybe start in and you think of some things and it just kind of fizzles and in fact, it was quite the opposite in my opinion. I think starting slowly, the project really mushroomed and expanded and grew exponentially and now there's some really good base to continue to move forward. So for me, that was the rewarding part personally. Yeah, I mean, I would add what Mike just said, like it was, I wasn't expecting this project to be this big at this, like it's, that's, I think that's very exciting, like the exciting part is that it became what it became which we never envisioned while we were starting this project. And yeah, I learned a lot, I think in this process but I think another exciting thing which happened through this project, I would add is that we wrote an NSF proposal at the end which is, which wouldn't be possible without having this much sort of back and forth developing this idea which happened in this very cooperative and collaborative fashion. So if we get the money from NSF, this can turn into a very real and very, I don't know, very useful thing for the entire community in coming years. So I'm excited about that. So let's see where we go from there.