 Good afternoon, everyone. Thanks a lot for joining me this afternoon for that session on how to keep hundreds of code repositories consistent and staying sane. Because when you have that many repositories, there are many opportunities where this can happen. So nobody wants to be like this. I'm not like this. And I'll explain how with two simple tools I avoid to be in that state. So who am I? So Vincent Fuchs, I've been working for 15 years in the industry, 13 with the social state general. So we are a French bank in India for the past 16, 17 years now. We're hiring. That's all I'm going to say. So I've been working in Paris, New York, and Bangalore for the past five years. I had several kinds of roles during my career, from support to dev, to tech lead, tech coach, and technical architect these past few years. I've been using open source since the beginning. What has changed for me over the past two years is that now I've started contributing back. So we'll see why this is important. What I'm focusing in on these days, it's about things on microservices. It's a buzzword. Everybody is doing it, so I'm doing it also. But more on the other aspects around monitoring and dev efficiency. How do we work smartly? And don't waste time doing things that bring no value. So for this, developers, we like to have tools. So I'm going to present you two more tools today. When we do microservices, we like having tools. So we have so many problems. We have one problem per tool, one tool per problem. Sorry. So for IPI Getway, when you're in the Java world, often you use a Zool, Console for Services Discovery, monitoring. We use the elastic stack, et cetera, et cetera. So it's a dependency graph of services provided by Zipkin. So if your services are enabled with Zipkin, you get this for free. It can be very useful. But there's no tool right now, not that I know of, about the code repositories. OK. This is our GitHub organization in my team. So I work with three teams in Bangalore, and they work on the same platform as four other teams in Paris. So let's say seven teams total. We have 430 repositories. So it's a bit of everything. There's some backend codes, some UI, some configuration codes, some documentation, some libraries, some stuff that is deprecated that we want to keep. It's like on the picture, they're all dispensed. Some of them are very similar. Some of them are very different. That's what we want, right? With microservices, we are supposed to be able to do something in Java, in Node, in Python. And yes, it's on paper, it's nice. Managing this can become a challenge. So that's why quite often we actually need a bit of consistency between all of this, OK? So why is it a problem? Well, typically, OK, imagine there is a vulnerability reported on one library today. And you know you're using it somewhere. But where? You've seen it once, you don't remember where. And probably you've seen it many times. So out of 430 repositories, how quickly can you find that in which repository you use that version of that library, OK? That's one. Then once you have identified all these places, let's say you find it in 30 places, you need to upgrade it. So what do you do? 30 times commit push. No, so I'll show you today two tools to help with that. So the first is GitHub crawler. We've called it GitHub crawler. So it gives you an accurate reporting on all your repositories, OK? So it's like a batch. You run it. And you get the data you need, all right? So I'll give a quick demo of all this after. Second is called CIDRAID. So where we automate the low-value maintenance task, low-value but still required, OK? And we don't pay our developers to do that kind of boring things. We want them to produce value. So that's why we had a problem. And in the true DevOps style, there was no tool. So we created these tools, all right? So let's start with GitHub crawler. So that represents exactly what I was saying. You know the information is there somewhere. You don't know where it is. So you need help for this, OK? So very quickly, a quick demo. So like I said, it's a batch. I can configure the output. Here, I've configured the output so that it pushes the data into Elasticsearch. So then you just have to put Kibana on Elasticsearch, on top of Elasticsearch. And you can browse the data very easily, OK? So here, it's over seven days. So you see, it's running every day. So this is my data for each day of the week. So if I focus on one here, OK. So I have then my indicators here. So these are big enough, yes? OK, so here it's more important, the repository name. And here I have the indicators that I want to display. So the Spring Boot startup version, for example. The number of branches for each repository. The number of Maven modules, OK? May not be relevant for you, but for me, it's important. So I've configured it that way. And it's very easy to configure for various things. Here, I have the list of other indicators that I have configured, but I'm not displaying. So if I want to know, for example, what can we check for? The Jacko Coversion, the Docker image that is used for all of these repositories. I have the information here. If I need it again tomorrow, I don't need to ask anyone. I'll just come here and get accurate data. No matter what my developers tell me, yeah, I've done it. Ah, but I forgot to commit. Yeah, that happens. Here, that's the source of truth on the repository. And because it's in Kibara, then here it's the raw data, but you can do any kind of graphics and dashboarding if that's important for you. OK, so that was just the quick demo of the output of GitHub crawler. How does it work? So yes, OK. So a simple config file, you give the GitHub organization name, which files you want to pass, which indicators and how you're going to find them in the file, and what kind of reporting you want. So you give that config file to the batch. So it's a Spring Boot 2 application written in Kotlin. And it starts from the top, the organization level. And then, OK, from the organization, I get all the repositories. From the repositories, I know which files I'm interested in. I'm going to take them, pass them, et cetera. And then I'm giving a reporting per repository per branch. And I push that data. Here, I push it in Elasticsearch, but I can write it in a file. I can save it in database. I can do whatever I want. So the typical config, quickly. So yeah, obviously, so we have our internal GitHub enterprise instance. But that would work also on the GitHub.com. Organization name, the hot token if you want to pass things that are private. So that's where you configure the indicators. So there are the indicators to fetch by file. So here, I'm saying in my POM.xml. In the POM.xml, I want to find my Java version. And I indicate the regx that I can use to find it. There are other parsers to find a dependency version in a POM.xml. There are some xpath expression to find the values that you want. OK, you can mention several files. So here, I have a POM.xml, a Docker file, a CircleCI config, a YAML file. There are several parsers that come out of the box. You can add your own very easily to be able to find the right information that you're interested in. Oops, sorry. So this was passing the files. And here, it's the miscellaneous repository tasks. Sometimes, you want some information on all your repositories, but it's not in the files, like the number of branches, the number of open pull requests. If you want to track this at scale, I would say it's easy. You just need to configure it and it works. OK. Second tool, Cidroid. So it takes care of the things that nobody wants to do. So some examples. Automatic rebase of unmergeable pull requests. So that's a problem we faced. It happens that your pull request doesn't get merged immediately. It stays here for maybe a few days. It was green at the time where you created it, but some other people have pushed code in the master branch. Then your pull request is not up to date anymore. It's still green, but it's not up to date anymore. So you can configure your GitHub to prevent the merging, et cetera. But one of the best thing that developers would have to do, ideally, it's to rebase their branch on top of master to make sure that their branch and their pull request is up to date. Here, every time there is a push on master, Cidroid will be notified and will take all the open pull requests and rebase them automatically. Similar use case, the notification on non-mergeable pull requests. That's something that is missing, I feel, in GitHub, is that when you create a pull request and that somebody pushes some code in master, and for some reason, your pull request becomes not mergeable anymore, nobody tells you. There's no notification system. So you believe your pull request is ready to be merged, that somebody will look at it, and it will be merged soon. And when the reviewer comes, like, what is this? Do you want me to review that pull request? It's not even mergeable. So it sends it back to you, you fix it, and then comes back, and you've wasted one day. So here, Cidroid is going to put a comment on the pull request so the developer gets the notification saying, hey, you'd better look at it because nobody's going to merge it in that state. Last one, the PR analysis. So we get notifications about the pull request event, so it's fairly easy. We don't do it in detail, but to perform some additional checks on the pull request. If, so now, your static analysis doesn't provide you the checks that you want, you can easily, you can plug some custom checks here. And again, put comments in the pull request. So fast feedback to the developer. How does it work? So I was saying Cidroid gets notified. Actually, I don't know if all of you are aware, but when developers use GitHub, everything is then converted into an event. Every time you commit, every time you create a branch, request anything, it's an event. Then you can go to GitHub and say, these types of events, please send them to this endpoint. That endpoint, it's simply a web book. So all you need to do is create a controller that is going to receive that payload. So that's what we do here with that Cidroid web book. It takes the event and forwards it through RabbitMQ to some task consumers so that it gets processed asynchronously. Re-basing a pull request, it may take a few seconds. So you don't want this to be blocking here at that level. So we process it asynchronously on the task control. Okay, then we got an idea. We thought, what if we connected both? All right, we have on one side GitHub caller that collects data. And on the other side, we have Cidroid that performs actions on repository. So can we find a way to pipe the output of one to the input of the other? Maybe we can save a lot of time here also. So all we had to do is actually create a new controller here that takes a specific payload in which we describe the type of change we wanna do, where we wanna do it. So in 10, 50, 100 places. How we wanna do it? Do we want to do a push or do we want to create a branch and a pull request that will be reviewed? That's it. Send that payload to the controller. So let's say we want to upgrade a version in 50 repositories. We send one payload. This guy is going to split the payload in 50 messages that are going to be processed asynchronously in by the task consumer. So this way, and the record, I think we've done it recently, 44 changes. It was an upgrade of version. 44 changes with a pull request that gets built. 44 changes done in 32 seconds. Okay, try to do it manually. I mean, you give that task to your team. Okay, we have 44, these 44 changes to do. I mean, if you put somebody a full time on it, okay, maybe a couple of hours it will be done. If you just don't give a clear objective, it's going to take weeks to be done because nobody wants to do it. Okay? So I'll give you a quick demo. This is what it looks like. Yes. I mean, CircleCI is a build tool. I mean, it's available online for free for open source project. It's going to build your code. Here's the upgrade. It's our own tool. We call it CRDRAID because it's the droid for continuous integration. And we got the idea when the last hours was out, so it came like that. So this is the UI. The UI is brand new from last week. So basically what we want to do, we have that repository here. You see with three branches, it's a simple example, but three branches, branch one, two, three. And in it, we have a file called test file three. Okay? And we have the micromets, the micromets, et cetera. We have it on the three, the three branches. Okay? And branch three, the same. So what if I want to update these three files in three branches? No, it's, I just have to fill the form here. So I'll take my token because it's going to commit on my behalf, right? So I'll get an email notification. Then I have to say which action I want to perform. So I'll do a simple replace. I have dummy commits. And now I want to have smart commits, okay? How I want to do it? Do I want to create a pull request or a push? Here we'll do it simply. Getting a smart commits. So we'll do a single push. And where do I want to do it? So here I just have to find a CSV file in which I've indicated the repository, the file, and the branch. So for all these resources, Cidrate is going to take the file, try to replace, push it back to the repository. Okay? Any repository, any file, any branch. Okay? It will try. If it doesn't find anything, nothing will happen. You'll get a notification that nothing happened. And otherwise it will do it. So here we do it for three resources, but imagine you have 50 or 100. It works pretty much the same. So here, if we go back to the repository, hopefully. Okay, you see we got three commits. And here if I see, 19 seconds ago. And now we have smart commits, okay? And if I go in history, it's all tracked. I have my commit messaged, performed on behalf of myself by Cidrate. So here also in the history, I know if it was Cidrate or somebody else doing it. We want to make sure that it's clear. So that in case of problem, it's all tracked. Okay? So, I showed it quickly, but the actions that we currently have is in stock. These are out of the box, you get these. So some things that are regex based, something that are more towards XML. We use Java and Maven. So a lot of things around adding a dependency, removing a dependency, that kind of things. It comes out of the box. If it was in the combo box. Okay? Now, if you're not finding what you're looking for, whether it's in GitHub or in Cidrate, well, you can very easily implement it. You can very share it, very easily share it. Because all this is actually open source. So if you want to contribute, this is where it happens. GitHub crawler, Cidrate, we have a few other things also that you can have a look at if you want. But everything is actually available here. Okay, github.com slash source stage genera. All right. Thank you. That's all for me. So I hope that was useful. So I presented you these two tools. So just quickly, who manages more than 50 repositories? Okay. 100 or then 100? Ah, so yeah. Okay. How many? Ah, 300. Ah, 300. Okay. See, we're referenced. 400. Yeah, so they have, so yeah, GitHub last year started with their robot things. So they came out at the, more or less at the same time, we were having the first version of this. And actually we spoke with the guys from GitHub and they said, I have a good idea what you've done. You should make it a robot or probot, they call it. Only problem is that I've implemented everything in Java and Kotlin for GitHub crawler. And the probot thing, it has to be written in Node, or anything. So yeah. So, so, well, we use GitHub. So it's only for GitHub, but actually the way the code is designed, we follow hexagonal architecture, meaning that actually the fact that it is that you interact with GitHub, it's one dependency and it could be replaced with the equivalent to interact with GitLab. So I don't know GitLab very well, but they probably have the same concepts of everything to request and all. So we all, all we do is interacting with the GitHub API. So GitLab probably has an API also. So I would say that there's probably just one class in the code to replace. No, I mean, so we have for GitHub crawler, we have a list of parsers. So there's one parser that is find dependency in POM.exe. So any dependency, because you know the structure of a POM file, right? It's always the same. Actually it's even a bit smarter because the version pretty often people declare it in the properties. So if it finds that there's in the dependencies, if it finds that it's a variable, the parser will actually go check the value in the properties. So whether the version is hard-coded or defined as a property, it will find it. All right? All right. Thank you.