 Hi, my name is Ray Pake. I'm the community manager at GitLab, and I have a couple of guests here on this session. There's a community contribution that came through last week from Adam, and I'll let the participants like introduce themselves in a few minutes, and we thought this was going to be a relatively simple fixes on our on our web pages and Adam actually helped identify you know some of the problems we not only identify some of the problems we had, but also help us kind of debug through the issue. So Adam, I'll let you introduce yourself and then we'll turn things over to stand and we can start our conversation. Adam, go ahead. Sure. So my name is Adam Leiter. I'm currently a graduate student in a PhD program in linguistics and a big fan of Git and GitLab, and I use it in my work. So I had opened this merge request to just fix some typos on the website to contribute back, and that led to this interesting issue. So cool. Yeah, so I'm showing the, I mean, I think there are 26 pages that you can, as you can see, and you can, I'll highlight the merge request number under www.gitlab.com so people can follow along. During the recording or afterwards, but we thought this were, I mean, relatively simple, like a typo fixes that Adam was generous enough to find, and notice a pipeline issue that we, because of that we could emerge, but so first of all, Adam, like, how did you even find, like, I mean, did you go through all these pages yourself to find these typos? Or did you run through some kind of a script to identify, like, these errors and documentations? Yeah, I had originally noticed just a few, just as I was kind of looking through the website, and I decided I'd go ahead and fix those, and I thought while I was fixing just a few, I would run some of the other documentation things through a script. So I did end up using a script for most of them. There were a few I just caught from just browsing the website, but cool. Yep. Awesome. Yeah, it's great to have a linguist contributing to Gitlab, and I think you're being somewhat modest here. I think you, I think you told me you initially started out studying computer science in university and got fell in love with linguistics. I think that's I think so you still kept interest in computer science or coding on your side, I guess. Yep, that is true. All right, Stan, I'll let you introduce yourself, and then we'll can start, continue our conversation. Sure. My name is Stan Heu. I'm an engineering fellow at Gitlab. I've been with the company for close to four years now, and before Gitlab, I would actually work, before I worked at Gitlab full-time, I was contributing to Gitlab in much in the same way Adam was. So I'm always happy to help out. contributors. Cool. Yeah, so yeah, I was going to point that out to us, and like a lot of our a lot of the Gitlab employees today started out as contributors before joining the company. So I'm going to quickly scroll through the MR here. I mean, you're not going to be able to see it right now, but obviously you submitted the MR. I reviewed them. I noticed that there was a pipeline error and then this was late in the evening. I don't think that was very helpful. I just said, hey, Adam, can you fix these pipeline issues so I can merge them? And I think I went to bed for for the for the day and then when I woke up, I saw a couple of comments from you. So Adam, tell me about like some of the what you noticed and some of the debugging that you had to you did to identify help identify the problem. Sure, so the Yeah, I submitted the merge request and then after a little bit of time, I got the notification that the pipeline failed and I didn't think much of it until their comment prompted me to look at it again, right? So then I did and I was kind of I took a look at the build that failed to see what might have gone wrong And there was this error about not being able to find the glibc the system level c library So I did end up trying your suggestion of just fixing like one type at a time, but I was unsure if that was going to work. So at the same time I was kind of like trying to take a look at some of the other merge requests because at that point I had noticed after just clicking on the general merge request tab and kind of looking at some of the recent ones Going down. There were a lot of them that were passing and then a lot that were just Or sorry a lot that were passing and like only one other one. I saw that was failing and conveniently You all have these like labels. So I happened to notice that this other one that was failing was labeled as a community contribution So I started to take a look at that I Initially it's sort of suspected from the error that this had something to do with the docker Container or the docker image that was being used So yeah, you're pulling up the failed build And there I'll let you get to it, but there was this as I mentioned the G libc error like it was it couldn't find the expected library So I thought that suggested it was something to do with the Container that was being used for the build, but then when I was taking a look at the Jobs from some of the other merge requests where the everything was passing It looked like they had the same docker image. So you saw the same docker like image hash The only one difference I noticed was the one that you want to head and just highlighted ray the fact that My job and then this job from this other Community member was using this web ruby to six cache whereas the The builds from the ones that were passing were using a different cache, which was web ruby to six slash two And that's kind of when stan jumped in to help debug. So maybe he can take over Cool. Yeah, I mean, I really appreciated the fact that You pointed out. Hey, there were other like similar mrs that were failing that came from the community I noticed that like they're like a previous week and and Obviously didn't get to the bottom of it. And that's when I started asking for help internally I went to a slack channel for merge request coaches and stan kind of jumped in and Uh and stan one from there, but stan, I'll we'll turn things over to you and like I mean, what are some of the first things that you notice? and What were you suspecting or Well, I was just picking it where adam had you know, he had raised a good question about this one's two and this one Isn't doesn't have it And my first question is where does that two come from because I have no idea In our git lab ci. Yeah. Well, I don't see that anywhere And so I first asked, you know, I just was kind of in that merger because I just asked some way on our team to Mosh because he's next from our runners. Like where does that two come from? um But then it occurred to me that we have this button in the And the ci cd pipelines that allows you to clear the runner cache And I never actually had known what that actually did all I knew is you could click that button all the runners seem to have uh wiped their cache and and went about their way So I I I figured it might have done had something to do with that. So I looked at the code and realized Um in that comment you can see is this is jobs cache index And so I thought that was actually a really smart way For us to implement this all we did is we increment a number to clear the cache So the runner would come in and get this number and basically Append this little index to the number. So that's that explains what adam had been seeing So naturally I just assumed okay if that number is two it must have meant somebody internally click that button and reset the cache and therefore it makes sense for all our contributors to do the same thing So I just did the naive thing is it okay if I clear adam's cache does that solve the problem? And it at least got his pipeline to pass. So that was a win At the same time I I think I I decided I was going to clear increment the cache clear that button again by pushing that button and Make our number go from two to three And as a result our the pipelines on the main repo had the same problem that adam had faced so And then other people started mentioning it too like their pipelines were failing and then I realized, okay We just we just managed to reproduce the same problem Adam had run you so there's something deeper here And then I So instead of like just doing the naive thing I had to go dig a little deeper and look at the actual error message and think a little bit about why that is happening And so I think on that merge it was I started mentioning. Well It's got to be a different version. It's got to be do something to do with different operating system versions because this happens when you have a different G libc version And I think I mentioned that I ran that in that comment You can see that I tried to look at that that image that was that was failing and it was clearly Using, you know, I think one I think this docker image is using version 2.24 and the and the The pipeline failure was complaining about needing 2.28 And so I think I walked away. This is late at my night And I think adam chimed in is it actually aren't there two images here? And that was a really good insight because I hadn't I hadn't thought about that So the way it works is that we in one step We build All the gems we need and we cache it and then the cache gets uploaded And then a different step comes along and runs our tests and extracts the cache So you've got these two different images if they don't match. This is exactly what happens and adam was dead on Here and so as soon as he pointed out it became very obvious how to fix it And so, you know this next comment you see that I I run the test you can see on the top one This is the build image. It actually generates the gem It's using debian buster version 10, which was officially released on july 6th But I think there was some lag time where the ruby image Moved up and upgraded to this version, but I think it happened in the last Week or so and then the second image is the the image that we used to actually test The website and that's using debian stretch. So even though it's 9.9 to 10. It's a big difference Don't let that number difference for you. It's a significant change and it bumped up the library So that explains why it failed. We have these two different inconsistent states. So The way you fix it is just make them the same, right? And that second image you see is the image we actually used to test get lab ce And we actually pin it to debian stretch for this reason. We don't want these You know operating systems upgrades to happen without us making a conscious decision to do that And so to fix it you can see the next merger quick follow-up merger quest is just the tag That build image with debian stretch instead of buster and so you can see this It's a really simple change You click on the changes button there, but essentially all I did is make sure that You know, I just changed the tag from slim 2.6 slim to 2.6 slim stretch and kicked off the builds and refreshed everything and Basically that fixed the problem. So What started out as a simple typo ended up being This maze of discovery and thank you adam for You know bearing with us and helping us guide us to the solution Yeah, so I'm just like yeah, this was like 7 p.m. Pacific time confirming that it was like late afternoon evening. It was still going on uh I remember like After clearing like adam and another contributors cash like we thought that was going to fix everything but then uh when that wasn't the case I I was starting to get worried like if sand struggling with this like what's really going on here, but uh So, yeah, I think liking couple of crucial areas adam you You know pointed us in the in the right direction So that was definitely appreciated and and uh, thanks for bearing with us I I think some people I mean tend to get kind of discouraged when things get a little bit more difficult difficult within mr But uh, you stuck through it over the over the past couple of days. So we definitely appreciate that as well. So Um, yeah, I'm glad I was able to help so yeah, so I mean so like obviously this wasn't like very I mean, this wasn't very like evident what the problem was if if this had been like left alone for a few more days I think we would have gotten seen a pile up a lot of mr's with a lot of errors and not knowing What was going on? I think that's like an obvious impact, but um Yeah, it's you can't imagine our website like including our handbook gets updated very very frequently What lots of mr. So this would have been really annoying for a lot of us at lab and people in the community but So certainly appreciate that so Yeah, and I'll say, you know, it's one of those things where the all the contributors were suffering from, you know People outside the company were suffering from it, but this was just waiting to bite us, right? As that cash expired and it would have expired I think in a week. This all would have started happening anyway. So Um, the fact that we we we fixed it for forks Also meant that we fixed it for ourselves as well. So yeah, so I got got ahead of the curb. So Which is awesome. So yeah, and and I mean and that I was like before like, you know I got on the phone with you last week at him after this mr I was looking at like, I mean, I think you made a first contribution to git lab about a year ago Back in august. I mean you you were being modest and say it was only documentation fix but certainly appreciate the fact that you came back and came back with mr that was impactful in a lot of different ways, so Yeah, no, like I said, I'm glad I could help and right Git lab is useful for me. So I'm glad to get back to it. All right. Well, I appreciate that so Thanks. Thanks very much. Yeah, no problem