 Good morning, everyone. I'd like to first acknowledge my appreciation of you all showing up and sharing this morning with me I don't take it for granted And Let's start the content and let's try to have some fun So tonight today we'll talk about mono versus multi repo in the context of CI pipelines So once we understood what those concepts are the pros and cons The next thing is maybe there's another way Maybe there's an optimal way to enjoy the best of both worlds and like every other solution I always try to think how is it applicable? How is it maintainable and how is it easy and Before every talk I always like to say like my main goal my main goal is that at least one person here Will say that's kind of easy. I can do that I can introduce this into my dev cycle and my dev organization But first let's talk about my favorite topic So my name is Michael Sego. I'm a 30 year old software engineer from Tel Aviv. I love Django and I love cats Not related but both correct I've been a software developer for the past six years back in Tel Aviv and I kind of moved between industries trying to find the place where I feel the most impact and well just happiness and I found the most happiness actually in the cyber security industry The cyber security industry allowed me to do something and it felt really right, but I didn't really understand Currently I introduced cyber security Basically security measures into dev teams into the dev cycle and allow them to focus on what they're good at You want to focus on your business logic? What you do I take care of the security and That kind of mindset also gave birth to today's talk I think we can all agree to a certain extent You don't start your day thinking how am I gonna improve my CI infrastructure? Oh, I want to add this feature I want to do that If you are please talk to me. I'd love to see that passion Most people want to set it once Forget it and even when there's problems try to put them to the side. That's not the right way to handle a CI So let's take that mindset of trying to make things work better For all of us, but first let's talk about the never-ending debate mono versus multi repo Just a show of hands. Who's a mono repo kind of developer? Okay, you're outnumbered that's good So this is a never-ending debate and it won't end because of one simple truth They're both great solutions. It just depends on your tech stack depends on your team depends on your culture You could use both and get amazing results, but in the context of CI, there's really key differentiations When we use a mono repo just a quick overview centralized code management easy refactoring team shared the same culture Problem is when everyone works on the same files Conflicts will be a daily endeavor and single-tagging for whole code base kind of makes Versioning and refactoring for different clients different versions of your app a little bit harder than should in multi repo independent deployments independent versioning autonomous work You want to enable your dev teams to express themselves and do things their way Problem for example libraries update overhead and more silo teams. Yeah, that's a new word for me as well But in CI context mono repo allows us to have one single file No code application very easy to go in understand what's going on and run with it While in multi repo we'll have to do a single file or system per repo and kind of reuse the same stuff Time and time again the overhead becomes taxing very quickly Another way to look at it is via this graph In the mono repo all the different services adhere to the same CI pipeline while in multi repo Everyone has its own infrastructure But what if there was another way Maybe there's some way to enjoy the flexibility of multi repo CI with the one-stop shop of mono repo Well, there is it's called centralized CI and I'm sure you've seen versions of the solutions You've heard the versions. Well today. We're gonna see a very lightweight easy to implement version using github and Python obviously and The main thing I want all of you to always keep in mind. I'm gonna show you a very An instance of all this stack But you can use your imagination and your skills to expand it and do amazing things with this infrastructure So why should you even care? Why should you want centralized CI because their current solutions probably work wherever you work and they do the job The central I see I will kind of allow us to do best of both worlds Which is done by adhering to this simple principle of the coupling decoupling the checks from the code There's no real reason that the check should know or should share the same space with your code And by this decoupling we get abilities like to enforce styling the alinter implement security checks, I'll expand on that later and Allow both tailor-made checks and Cross repo checks and that's like the essence of it The central I see I overview basically talks about our scm get a big bucket get lab Your developers yourselves are working doing your day-to-day jobs Events are flown everywhere. For example, we're listening to PR events Those PR events will get transmitted to some back-end that will analyze them verify them decide what you want to do with them and They will decide which CI jobs we want to trigger Now everything we're gonna see later always go back to this differentiation We have a place where code happens PRs are written comments or edit things are at edited They should not be aware. There's a whole CI framework behind them. That's the whole beauty of it Now we have this back-end the brain of this whole process that Analyzes those event verifies your token see what needs to happen and you can very clearly Just understand soon if I have this place the centralized brain I can probably do very cool things with it. I can start to analyze How much time does everything take stats maybe the per developer kind of stats monitoring your own CI and PR activity is a very Recommended practice especially as your team and your company grows and in the end we trigger a CI job in a completely Separate repo that contains nothing but github workflows or CI checks really depends on the platform. We're talking about So now we're going to go about the same thing only in a specific github instance So we have some github organization With the great naming repository a repository a user opens some PR on github that Our github app will send set event via webhook to our back-end That back-end will first and foremost verify The token it's valid. It's in the right format Then analyze what happened in this set PR which kind of files were changed What was the change and then decide okay? I'm going to trigger this stat and the third check That is sent back to github in a dispatch workflow. That's the name of the thing and It will trigger workflow in the central CI repository. So again, the whole thing is we have three different layers First layer is your actual code second layer is the whole github talks to your CI brain that decides what to run and then the actual running of the checks in the completely third location So what do we need for our demo so we can see the whole thing and get a little bit more excited? So we'll create a github application that will monitor those webhook events and send them to our decided destination We create some simple back-end listen to PRs and use and rock as a tunneling service to our well my server here I wouldn't be a very good Security where developer if I didn't just briefly talk about when you're introducing some sort of application into your Dev cycle try to give it as few permissions as possible and be very much aware to what you're giving it The amount of attacks and issues that derived from this very problem are too too many to count And I always suggest go over the fine print even though it's in dark mode. It's a little bit harder and for just a more Tackwise what you need you'll define the web book URL, which is basically where your private server is hosted You need to define a web book secret Define the repository permissions and And Actions and checks are two different concepts in github, but they're important action is the actual brain of what's running? So for example our linter run flag 8 Is the action? But the check is basically the notification of the result of said check in the appropriate repo because if we're running the logic in This centralized CI location It doesn't really make sense that we will go there every time we open the PR to see what happened with my PR Especially when so many people are opening PRs at the same time. So for us to notify the original repo Hey, this is what happened with your action. There's a check. That's just the words So let's build a centralized linter This is our Europe Python demo Organization and the three layers we talked about are presented here by three different repos The test repo is just our code base some basic Python stuff. Hello Dublin Our CI is the brain part will dive into it to just to show you what it means But nothing too major and the further the most interesting part is a central CI Which is This is the repo that contains all the different checks as you can see there's just a readme file and github workflows With two different checks I don't know how well, you know the syntax and everything but we'll go over it real quickly because I know it could be quite taxing This is the flow that we will trigger from the brain part of our service As you saw in the repo all it has is YAML files and simple readme It has no idea the infrastructure it runs on what it's about to run the code It doesn't care. It's a very simple dumb-down check and it gets all its relevant data from the dispatch workflow event from our back end So what we're gonna to do here and you're gonna about to see in a moment is we're gonna explore github actions We're gonna create a check. That's that notifier we talked about check out the original repository run lint with flake 8 and According to success or failure get notified in the brain part of it We're going to src name. This is we have one public endpoint Yeah, I try to play with it a little bit and that public endpoint will trigger at handle PR event That handle PR event will package necessary info. This is the handle PR event As you can see we create this client payload It also talks about the owner in the relevant repo and has all the data we got back from the open PR event Packages it here. I just hard coded decided run this check si YAML this Dumb line is basically the essence of your potential. This is what I want you to take. I just decided run linter I don't care But according to your use in your needs and we'll dive into a more complex example You can make this line into a whole new service that decides we check should run and you could really control Your see I and that's the point of this monorepo versus multi repo whole debate Because here I basically allow you just like in mono repo One place to define everything. That's the power but just like multi repo This area is your place to get a little bit smarter and then introduce new things For example, you go over which files were changed during the PR. Oh, I see some Python files some JS files even a Terraform file So I will run security checks According to those files. I'll run bandit. I run kicks whatever security measures you want to use and This is where basically potential becomes infinite. So let's see this whole thing in action in the end I It's not going to be so dramatic, but I think you'll see the value. This is my test repo. Hello world. I Want to crash the CI slinter? Of course too many blank lines create a PR In the background Through and rock to our server events are flown handled and And this is the only annoying part. It takes a little bit of time till everything clicks But and I go back to our free layers all the time. I want to go back to the console. It simulates better We see now the check the linter check run and we got a response But where all three layers come into place? Well, this is the code there the SRC Some one opened the PR and he got notified of the result regarding the PR He has no idea about the check has no idea about the tech no idea about the infrastructure He just opened the PR and got immediate value There's the github application that talks to the back end that whole area has no idea Where those checks are stored what's gonna happen? What's the next step? It just analyzes and sends said checks and the checks eventually that Run, I'll show you exactly where the checks themselves ran See this happened just seconds ago. This is the action. This is the logic This is the linter part of it what we saw in the PR is the check the notification and we'll enter the action and we'll see What's exactly the problem So we just went through a very simple cycle that I think we can all see the value and how we can use it and how It will impact All an important part that was shown and I want to dive into because it's really important in any sort of solution You're trying to introduce into a dev team If I didn't come if I didn't convey the results via check, which was very easy I would just run the action. I would even have less permissions and make everything a lot easier What would be the biggest problem? I? would make the solution Just uncomfortable and there's no point in a good solution if it's not easy to use and easy to maintain for the end user I Try to talk with my user in the most friendly way possible via check in your respective PR Always try to think eventually how said person will use what you're developing and this is the more complete flow It adheres to that principle We have the SCM that talks to the back and talks to the CI But the CI reports the outcome all the way back to the original repository So once we modify the CI job to create an update checks, this is a more complete flow We have a repository checkout we create the check we run the limiter and then we update said check So I've alluded a couple of times regarding this whole I'm going to show you a very basic example of it How could it be used for better? More creative ways well we use this infrastructure or maybe a more robust version of it in our day-to-day work back at JIT Not going to talk too much about what the company does. That's not the point of the convention but we Analyze each PR its contents and understand okay, which security measures do I need to use and We give our clients and our developers ourselves during our cycle never-ending security measures Using this exact framework and of course as I said we convey that via a simple PR comment Which I think is a very friendly way that we all are accustomed to so Thank you so much for your time. I hope this was short sweet and gave you some value Before I finish. I just want to say one last point regarding this whole talk However you apply the CI CD and other infrastructures your code Don't always try to choose the lane that you saw in some medium post or what happened before you Try to see if there's a way you can use the best of both worlds And you will be quite surprised to see it's a lot easier than you think it might not be at one click It might not be one day of work, but this whole infrastructure took us One and a half days to implement and it's still used to this day Of course at scale with much more abilities and DB that continuously analyzes the result and gives us real great feedback so Just to package the whole thing if you are intrigued about how we do things at JIT if you're intrigued about cyber security If you're intrigued about other CI CD pipeline solutions, I'll be here the whole day and I'd love to talk Kind of I prefer people over code and I'd just love to meet each and every one of you If you're inspired you have questions. I'll be here all day Do we have any remote questions just quickly no so if can we just use the mic here please A great talk and especially great advice regarding the medium article My question is In this infrastructure. Thank you. It's great outline. Awesome How do you cater to different requirements of those repos like one repo may require? Let's say more resources on see I like so much beefier see I box Then other repos. Yeah, so First of all most thank you so much for the question. What's your name? Pablo, okay. Thank you so much Pablo brought a very interesting question regarding I put everything in the same Organization while clearly they have different needs. I've done that just for the demo's sake in our real Operational the area that runs the actual checks themselves the logic part of it is in a completely different organization They have a signature github token that allows them to communicate and it has its own Freshhold and amount of checks it can run So they don't really affect one another But as you saw maybe I'll show you later in the Yama itself we define all the infrastructure It needs it runs it once and kills it at the end Hopefully the answer is it gives you more context Hi Christian, you talked about Github in particular, but what about other? Provide a slight like big but good good laugh or yeah Again, thank you so much for participation. This is the fun part of the talk This entire flow is applicable in all the big free SCMs. It's slightly different. It requires a little bit more Work on it. We've done the exact same thing in github for specific POC. We want to maybe to switch over But yeah, it's applicable. No free I just wanted to show in specifically in github because it's a the easiest and most popular Thanks for the talk and and your energy throughout was was really good. I can definitely see the kind of While you might need this especially in like a monorepo Microservices type setting. I think there's maybe Some benefits that you have when you have the CI Designs coupled to the code. Let you know exactly Which what the configuration was for the CI for a specific snapshot of when it was run against Do you think that's always worthwhile to That cost is always worthwhile With this kind of setup First again, thank you so much for the participation. That's the fun part of the talk wonderful question and The simple answer is yes, and the even more simple answer Is you saw we configured in our CI checks all the different checks if you want to do a very specific check It's tailor-made to a specific service, etc. Etc. You can configure a very specific YAML file You'll need to do some ugly code back in the brain part of it to choose where to run, but fine Sometimes not everything is best practices But we do it actually quite constantly This is a side note back at JIT. So we orchestrate open-source security tools All those security tools in the end have their own infrastructure and tests and they need to adhere to some CI So at the CI Central CI we have like specific checks that are relevant to each one of those tools You can't always have the same test for everyone So yeah, we have very specific checks for a good reason Hey, yeah, first of all, thanks for the great talk very interesting technique I'm just wondering how you for example scope the deploys of the back end for managing the CI So I could imagine where this becomes a single point of failure for your entire test and deploy pipeline You say you block your entire company's deploys because you updated this CI back-end and made some mistake, whatever. So I guess two questions How do you test the CI back-end itself? Does it use itself to do that as well? And yeah, do you have some ideas on how it like to do maybe like have deploys per team so that at least you Don't take other CI pipelines down by doing that Okay, first good question then interesting suggestion Thank you so much for participating Yeah, the CI could become a bottleneck. It's one of the few joys of working in small teams The CI itself checks itself, it's not a great way we need to find a better solution for it But yeah, we have sometimes this very weird thing where the CI fails We don't take it for granted we go over the logs and like oh something wrong happened A great way to keep it static is first and foremost all the different Tools we use in the CI use specific versions not like latest stuff like that I assume all of us so at some point of our careers got burned over something like that But it doesn't of course replace the basic testing like unit testing integration end-to-end all those things happen But yeah, we need to do a better job of doing CI more resilient and Regarding your question of making different pipelines for different teams We currently don't do it, but it would be very easily applicable We just need to create a new get-of-app It will adhere to a different CI server, but it will eventually report to the Cm to the same CI checks code base I'm actually kind of excited to maybe try it out with my team. It's a good idea But like where I would use it if you're part of a really big team that has different products Maybe don't want the same checks or maybe you have one team That's a Python led team who and the other Java team So you want to have different checks for them? So that's maybe where I would use such a thing with a great idea. Yeah, thank you. Thank you Hi, thank you very much for your talk We are fully multi repo and that works great for us, but very occasionally we have Two PRs that sort of affect the same thing in two different repos and NCI suddenly is an absolute nightmare for us because That one tool uses the master version of one and the other uses most version of other while they really should Test against the both PRs. Do you have a solution for that in your current setup to? test two PRs on two different repos simultaneously I'm sorry should probably answer quickly I Actually don't see how that problem will rise in this architecture because basically if I'm if I'm misunderstood you please correct me You have two different PRs two different repos whatever That run pretty much the same kind of check But in the end both web web hooks will be sent to the CI Backend will decide. Okay. We want to check we want to run check X They both will run the same check X and you'll get the same result No, no, the problem is that these two repos use each other as dependency basically, so one is and then Stop that immediately It's a Sometimes you don't get around it usually you get around it. Yes. We said previously the world isn't always best practices I would love to sit with you and try to solve it by the end of the day but The kind of checks we've shown as I'm sorry just give the focus a little bit around the checks We've shown are more cross checks like for example linter the whole Purpose of linter is to make sure everyone that here the same styling guidelines or very specific checks like security checks We don't really use our CI to run the abilities of repo a on repo B vice versa If you have a specific ability there, I would try to extract it to its own for example here a gift of action That you would run so you would stop this decoupling The only problem that will rise is you will have to continuously update that github check as you change the code base But that's a very fun thing to solve I'll talk to you later. Sure Okay