 All right, well, thank you for having me. I'm delighted to be here. And sort of building on what Donnie has been talking about, you know, documentation and being effective about it, I guess here's yet another talk that talks about that. So I'm going to be talking about this idea of a lab notebook with the intent of bringing science back to data science. By way of introduction, I'm the founder and civil researcher at the Montrealia Ethics Institute. It's an international nonprofit research institute that aims to democratize the ethics literacy. I also work as a machine learning engineer at Microsoft, where I serve on the CSU responsible AI board. So, well, if you notice an odd format of my slides, I left a little bit of space because apparently our like little video bubbles are going to pop up on the site. So none of the content gets covered and I'm hoping that that goal has been achieved. So who am I in addition to my formal introduction? I am first and foremost a practitioner. So, you know, when I'm not doing talks or not writing papers, et cetera, I'm writing code. And so that's, you know, that sort of is my lens to all the work that I do. I'm a community builder. It's in the context of the work that I do in this phase of AI ethics. The goal being that it's important to bring in a diversity of perspectives and learning from the folks who are on the ground in our communities is a great way of doing that. There is ample that I've learned from folks in adjacent fields. In fact, the idea of the lab notebook sort of came or emerged from this idea in the physical sciences where they have an actual physical lab notebook that they use. I'm a writer, I believe it's important to convey ideas and, you know, writing also helps to articulate thoughts. So I tend to do that quite a lot. I can find my writings on my website, on LinkedIn, Twitter, et cetera. And finally, I'm a pragmatist. I believe it's important that we remain practical about some of the challenges that we face in the field today. And the lab notebook is one attempt of me being pragmatic with some of the challenges that I've faced as I've done this. And sometimes I have strong references for certain things which really end up being trivial and don't matter. So, you know, side with GIF or GIF, whichever suits you well. And just as a note, if you ever want to find me, my Twitter handle is at the bottom. And as is my website. Cool. Okay, I'm gonna try something new and we're gonna see if this works. We tried it yesterday, it did work. So I'm gonna give it a shot. I'm gonna do a quick poll. And the intention of doing the poll is that I think it's a great way to collectively see how all of us think about this idea. And of course, I'm gonna be talking about it, but I think it'll be a good lens into how all of your fellow attendees are also thinking about this. So I'm gonna make that switch real fast and see how efficient I can be with that. And I think that worked. So if you all wanna help me out here. So this is a tool called Mini Meter and I'd love for you to drop in a few words on what comes to your mind when you hear the word, words lab notebook. You can go to menti.com and punch in the code that's at the top of the screen. And yeah, hopefully some of that will start popping up as a workload. We'll be able to see some of the themes in terms of what you and fellow attendees are thinking about this or what they think about this. And of course, I'll walk through my presentation as well. So hoping that you're able to do that, go to menti.com and the code is at the top of the screen. Thank you. Yeah, I see someone's put in something there. And you can put in multiple submissions as well. Lose is structured, esoteric, reproducible. I love that. Jubiler, okay, yeah, that's, yeah. Moddy, organization document shared. Yeah, that's important, organizing stuff, experiments. I guess you missing sheets of paper. So given that we're gonna be talking about a digital notebook, hopefully that's not going to be an issue and I'm gonna be talking about a version controlled formulation of it. So again, hopefully that's not a problem. Notes to self, yeah, that's gonna be a huge part of where reproducibility is there. I like that someone put in napkins. I'd love for whoever put that into ping me on the CSV Slack. I'd love to hear more on what you, wow, this is fantastic. And so, yeah, I see why Jupiter is sort of taking center stage here, which is great. So I'll clarify that as we go along. Reproducible documentation, organization, those are definitely key steps. DaVinci mirror writing, yes, thank you. You will have some DaVinci moments hopefully by the time we're done going through this. So I'm gonna leave this open for another 30 seconds if you wanna put in something. But I think some key themes have already emerged here in terms of record, in terms of documentation, in terms of reproducibility, in terms of having a legitimate process to how you're going about practicing data science. Yeah, this is fantastic. Thank you so much for such an enthusiastic participation. More than a hundred people have already chimed in. So this is awesome. Very cool. Okay, all right. So I'm gonna move to my next question because we're also like super tight on time here. And just one second here. Okay, you'll have to pardon that my computer is a little bit slow. Pull up the next question. And it's still the same code. And you should see a prompt on your screen that will show you that there is a new question on there. And so just as you did for this one, you can go in and quickly punch in a few words that you think, what would be some of the biggest pain points that you think a lab notebook can solve for your data science workflow? I don't know, this is a little bit premature because I haven't even, you know, explicated fully what it's gonna be about. But I'm hoping to surface some interesting themes and ideas that you guys think that a lab notebook can potentially solve. Reproducibility coming up again. Shareability of process, yes. I'm huge on processes of practitioner. So I love whoever put that in. Exploratory analysis, communication is huge. That is, again, very, very important. Collaborating, fantastic. Remembering what I did, amen to that. Gosh, the number of times I forget. You know, I have like unditled notebook number 23. And then it's like hella confusing why or what was actually going on. Shareability of process, fantastic collaboration. Reproducibility is sort of taking center stage, which I love because I'm sure a lot of you are familiar with the replicability crisis that we faced in the field. Especially, I think, you know, actually one of my professors, Professor Joep, you know, and, you know, sort of kick started a more mainstream discussion on this at New Europe's about three years ago, you know, documenting how there were a lot of problems with reproducibility in our field. So this is fantastic to see. Perfect, thank you so much for all of you who've chimed in so far. I'm gonna go back to my slides now. But what I love this, I think this is fantastic. And, you know, we can continue to follow up on this work cloud as we go along as well. Whoa, I'm gonna go back to my slides real fast. So this was relatively painless, which I'm happy about. And hopefully my slides are back up. So thank you for chiming in. I think, you know, I can stop my talk now because all of you all covered quite a few of the points that I'm gonna be talking about. But just sort of, you know, concretizing all of those, the key goal for the lab notebook and the way it's gonna be practiced. And I'm gonna, you know, first sort of just lay out what I think, you know, the what, why and the how for the lab notebook. And I'll drop in a link because it's a short talk for a tool that I personally used to create my own lab notebook, but I'm also happy to chat afterwards in this slack in terms of how to go about doing it. So primarily it's, you know, it works as an organizational tool, right? And it's not just for tracking ML artifacts, as you would think, for which you can use pre-established tools like MLflow, DVC, weights and biases, et cetera. This is more so for ideas, right? This is more so for the journey of arriving at ideas. You know, all of you are no strangers to this perhaps that, you know, when we're practicing data science, rarely do you arrive magically at the solution all at once. It's a meandering path and journey to get to it, which means that, you know, when we arrive at that final idea, if that has happened over a long period of time, we end up forgetting how we got there. Why is this important? This is important because, you know, you might come back to that project later on or you might even come back to it like, you know, two or three weeks later and you forget how you arrived at some of the decisions and you beat yourself up as to, you know, why you made certain choices. So this is one way of sort of organizing that, right? It's meant to serve as a memory aid. And I'm, you know, again, with the pandemic, there are a million things going around us. Work and home life has blended as well. I personally find the lab notebook as a, we have centering my data science practice in a concrete way where, you know, it just sort of gives me clarity every day coming in, doing my work, closing my day off, having a bit of sort of closure to the work that I've done, you know, documenting some of the findings and, you know, picking it back up again the next day. I also wanted to put in this idea of record of ownership of ideas because, you know, as I said, this was inspired by the physical sciences. And in the physical sciences, there is an important, you know, there is importance in sort of who, you know, germinated or who created the idea. And, you know, that can become important in terms of, you know, proving ownership of the idea when it comes to, you know, filing for patents, et cetera, maybe that's not so much the case in a corporate world because, you know, you're all operating in a team together. You may be working on applied products, but even in industry research labs, that this can be important. And so the lab notebook can be a way to exercise that. What's also interesting is that, you know, some of you brought up this notion of, well, you know, lose sheets or, you know, sheets getting lost, et cetera, in the first question. The way, you know, this obviously formulated in a digital context, you situated in a version control setting, and you can use the Git sort of IDs to identify, you know, who the author of specific ideas in that is. And of course, you know, a lot of it is also predicated on how you practice it. And so I'll talk about some best practices or I shouldn't say best practices, but some proven practices that I've utilized and, you know, hopefully some will resonate. And if they don't, also please feel free to ask questions on that. So I kind of already touched on this, right? You know, why to do, why to have a lab notebook in the first place. I think the most important thing that it does for me is to set as a North Star on why I'm doing a certain project in the first place. What ends up happening at least, you know, in practice is that you have a lot of evolving business needs. You have customer demands that change over time. And, you know, as I said, you know, I'm the founder of principle research at the Montreal Aethics Institute. I spent a lot of my time thinking about ethical issues in AI. And quite a few of them actually stem from this notion where, you know, the project starts as a way of, you know, addressing a particular problem and creating a solution, but then gradually shifts over time where it is infeasible and incompatible with, you know, how you're approaching it. And that leads to a lot of ethical concerns. And so using this lab notebook as a North Star as a center is one, you know, good use for it. Of course, we spoke about this lineage of decision-making so you don't want to have, you know, untitled notebook number 23 that you throw over the wall and have it be, you know, run, you know, productionized. So as to say, of course, you know, refactoring the code, et cetera. But the journey is very important because later on, you know, if you're seeking to make some improvements over it, you need to know how you arrived at the current configuration, be that in terms of the model, in terms of the features that you've constructed, et cetera, the training regimes that you've used. And if you don't have a documented lineage of that decision-making, you're gonna run into issues where, you know, you don't really have clarity in terms of how you arrived at that current configuration and what's already been tried so that you don't waste time doing that again. And it's not meant to replace things like MLflow, DVC, weights and biases and other tools. It's just meant to supplement that in terms of documenting some of these things that aren't really documentable in a clear fashion in some of these other tools. You know, another thing is that if you're to come back to a project, as I was saying later on, you don't wanna be in a situation where you're relying on, you know, limited artifacts to guess how you arrived at it. You know, I don't have a photographic memory. I wish I was like Mike from Soot who remembered everything. I'm not. So, you know, having something like this, you know, just makes my life easier. And for those who are, you know, perhaps interested in the ethical side of things, audit trails are gonna be something that, you know, immersion regulations are gonna make mandatory in a lot of places, especially highly regulated domains like finance and healthcare. This can be an instrument that helps support that as well. So as I said, you know, helps you keep track of ideas. More importantly, it becomes a record for future research work as well. So not just for yourself because, you know, let's face it, this happens to all of us. You know, we come back a month later and we're like, who wrote this code? And it's, you know, you do a good blame and it's like, yeah, it was me. And then you just, you know, end up looking like a dummy. But also it's for your teammates who are gonna, you know, work on this project, perhaps in the future. So it's a great way to do that. You know, sometimes even funding can run out for a particular project or you might have to lay a tournament and come back to it. So again, a useful mechanism to do that. Reproduceability, the more, you know, structured documentation you have in terms of how you arrived at a particular configuration, the easier it will be for someone else to also reproduce your experiment and arrive at the same results or find things that haven't worked and help you, you know, sort of do better. And so, you know, basically avoid this, you know, whole hair pulling scenario, which is never fun. So how do you go about doing it? You know, creating a lab notebook that is version control is, in my opinion, the best way to go about it because it makes us, as you know, you guys pointed out, makes it easy to share and collaborate with, you know, your fellow project mates, be those lab mates or teammates in a corporate setting. Specifically, I'd also like to talk about how I go about doing it, which is investing everyday effort in it. So, you know, starting at the end, start at the beginning of the day, reviewing what I had written closing my previous day and, you know, using small artifacts and small ideas that I, you know, sort of write down and moving from, you know, a cursory, you know, documentation to more detailed ideas as I, you know, go through the day. And also it helps me have a, you know, degree of closure towards the end of the day because I know, you know, I've achieved a certain amount and I'm able to document and say that. Another very important thing here is to use a ledger approach as in like, don't edit entries, but just add stuff because it helps to generate audit trails. It's the same idea from a physical lab notebook where you don't want to tear out pages which might invalidate the integrity of the notebook and, you know, make in that case a pattern filing harder. Finally, you know, when I'm talking about how to go about doing it, clarifying and rephrasing, you know, every day as you iterate on the ideas is a great way to start small but keep building on it. It's like accumulating interest, right? You will see the results of it at the end of the lifecycle of the project. Another thing that I found to be useful is once I've figured out what works, generating a template and sharing that with the rest of my team and finally also recording negative results and failures because it's a journey of arriving at the final result rather than just, you know, a magical solution that shows up an untitled notebook number 23. So maxim that I use is better to write it down and not need it than to not write it down and need it later on. As I mentioned, this tool that is up on your screen is one that I use, I'm happy to chat more about it in the Slack channel as well because I'm conscious of time. I feel like I have overrun it a little bit. I already touched on these AI ethics issues. So this is a great way of making that happen. Finally, just a little bit about the Montreal AI Ethics Institute. If you wanna learn more, you can go onto that link. Thank you. Perfect, you have about 30 seconds left. So perfect on the time. I am wondering if I can ask you a really quick question, which is if you could explain version control about a lab notebook in like 30 seconds for a researcher. Yeah, absolutely. So the tool that I mentioned that was on the slides, version control for a lab notebook is just, so you set up a template, right? You can think of it as a markdown file or whatever formats you prefer. Setting that up and adding quote unquote pages to it. It's version controlled in the senses you would perhaps update a read me file on a GitHub repo, but doing that incrementally rather than making changes where you'd have to go and do good diffs to see where the differences are in terms of one page update to another is just adding on pages and updating it as you go along.