 How's everybody doing after lunch anybody snoozing yet? That's why my chair is here case. I need to take a nap. All right, so thanks for attending As I mentioned, I'm gonna be talking about Jenkins. Who's using Jenkins today in their environment? All right, good. Who's using a Jenkins competitor? still relevant conversation, so You know, I've got a quick agenda It says 33 slides, but it's not really so it's a it's a fake out I'm gonna do a quick introduction about myself a little bit of background and vision So really what we're talking about is before I was in consulting. I was on the customer side. I've done Development I've done operations. So this story was a real-life story. It played out over three years And I'm gonna talk about a little bit of the background of the company and kind of what happened and what we did our approach to implementing DevOps culture and processes We're gonna talk about why what our reaction was internally to the company You know what people were saying internally and then a little bit of lessons learned That we you know the summary of everything that happened So as she mentioned I have a Consulting systems engineer so what that really means is I go out with the sales guys and make sure they're not lying to you When they try to sell you something Prior to that I've done consulting before and then I was over on the customer side at a bunch of different companies Insurance government contracting so a whole bunch of different things going on there my emails there I have Twitter, but it's mainly jokes and retweets from Taco Bell So probably not anything interesting there, but I feel free to shoot me an email if you have questions or comments Later on so like I said this this story was really around You know when I was on the customer side and Kind of what we what we did there so a little bit of background of the company They were in the insurance industry They are 10,000 plus employees They were worldwide so we had a spot in India they were over an APAC they were everywhere They their main Source of revenue was their claim system So when somebody filed a claim that whole processing was done on a lovely stack that you can see there Tibco mainly and net were the big pieces And the big thing to talk about in IT was everything was manual so manual deployments manual testing manual virtual machine provisioning if they were actually provisioned So a lot of manual processes and it looked like you know most companies that size back in 2013 when this occurred, right? So a lot of companies Really in that boat when we look at the software development lifecycle broke it out into build and deployment It kind of goes along with the story here. So you look at build You know, you have your developers writing code to check it in and then they're done That's really all they did outside of that There is this whole build team that actually go and build the the buying areas for any environment that was outside of dev So testing QA production and there was this whole, you know, mixed match of processes To get that change approved and actually deployed in those environments, right? And when you're talking about production, I was even better because you got a cab You had to meet you know get your change request approved by Monday for a Wednesday cab And then it was scheduled a week later, right? So typical change management process at any large company And then the best thing about those they were six six hour phone calls starting at midnight I used to be on those those were fun With like 40,000 people on the call trying to figure out what was going on, right? So it was a lot of very heavy process and clearly, you know the team that was kind of put together that I was on You know clearly you can see some problems with this process You know the management of the company really wanted some change in IT They said we got to go faster. We got to get more releases out the door We got to be efficient because our our incident rates are huge Right, we did some metrics to say incidents were categorized mainly in two Two errors Areas one is a manual whoopsie on you know the keyboard the other one was a manual configuration error Like we're configuring our applications wrong whether it's release notes or what what's the problem? So That's kind of you know where we were The management said all right we're gonna we're gonna have to change we need to go faster. What are we gonna do? so we came up with this vision of Changing the application deployment Process as a whole right the idea was we wanted push button code deployments that work the same in every environment It's very simple. Everybody wants that because you wanted to work You know so what we did we stood up a team called the platform ops team It was actually based out of the operations. So we we reported up to the operations VPs and You know we said hey, this is gonna be at what our goal is and You know this is what we this is what we want to do. We got support at the VP level You know we had actually a whole huge vision that was around virtual machines and stuff But that's not relevant to this conversation, but this was really what we we were focusing on and figuring out how to do this You know typically You know, this is an idea that everybody has and usually you see something like this Right, you'll see an architecture here where I have a bunch of different teams They're they're pushing the source control and then a triggered from from a Jenkins master goes and executes the job on a Jenkins slave and then you know deploys to the environments Well There's a couple problems for our company at the time to do this one the biggest thing is that our devs Didn't build their code So you're gonna let devs on a Jenkins master have admin access to go configure all the jobs with all the other teams That are also doing the same thing They don't know how to build their code. So there that sounds like a really bad recipe for disaster You know, it's very tightly integrated here So if one person potentially breaks a slave or the master everybody else is gonna suffer and that was a huge step backwards from a Really solid Structured it's very slow process of having a build team build build the code, right? Because we knew those people knew what they were doing, right? So we were like We like this but maybe not right So that the platform opt team, you know myself and like three other people to start I Had never seen Jenkins before in our lives So like we have no idea what this is or how to how to use it or even how to Architect it in a way that is performance friendly for all the the incoming requests that we know we're gonna have from the devs So how are we gonna architect something that's awesome in the devs really want to use if we don't know what we're doing So we kind of took a little bit a different approach to the architecture So we said okay Well, this is gonna be the first rev of the product that we we push out to our internal customers of the develop You know the development teams So the first thing we looked at is we're gonna take let's let's do a CI CD pipeline. Let's figure out what that is for one But then we're gonna also really kind of break this down into almost like a SOA based process Where we're gonna look at one chunk of the process at a time The first thing we did was look at okay, we're gonna separate separate build and deploy out very simply right in order for us to do that We said well, we need to start with a artifact repository That's gonna be our main gate between developers and operations because it makes sense for where we're currently at today So we also decided on the idea of build wants deploy many very simply it still meets to our goal of push button deployments that work the same in every environment and It also helps us Because it gives some aspect to control Within the company. So we were bound by ITIL processes. So tracking everything being having something central is very important Right. So this really kind of didn't talk about testing per se, but it was an idea to let's start Let's start up the foundation of where we need to be Next we started building out the build process right so very simple, you know continuous integration pipeline Dev would write code check it into the source code repository trigger from Jenkins Would build the code stored store the code and then deploy into their dev environment, right? Because they had you know admin access to those environments anyway. We said alright, this is very simple easy as for us to do We also gave access to the that dev team on that Jenkins server. They're right. Here you go. You want access here it is You know the big thing here is that there was no formal Requirement for any change requests process to happen. So it was only a free-for-all. It's a dev You know dream come true that they get full access to the environment They can do whatever they need and nobody's stopping them, right? So so once we figure that out we said, okay Let's let's look at what deploy means right because deploy outside of dev is a whole different concept because again We have these ITIL processes so we have a similar but slightly different workflow for the other environments, so What we really did is we took away Jenkins access to all these other Jenkins servers We deployed a bunch of servers and said no you can't touch those you you can learn on your own environment But when you want to talk about going to Integration and QA and prod and DR and all of the other ones that go behind there We're going to give you this really slick UI and all you got to do is log in You got to select an environment to change request number your product Which is your product that you're working on in the version that you want to deploy. That's it Everything else is abstracted away. You have no access to Jenkins and you have no access to the environment Right, but at that point, you know, they didn't really need that right so the idea is that user would log in They would select those things and they would go and hit deploy and what would happen is The portal would reach out and say hey ITSM Solution which was serviced now at this point Can you check to see if the change request is approved right the change that the ITIL process has been followed? And if there's a time window associated with that request so for examples in production as it's between six and Midnight and 6 a.m. Yeah, are we in the right time window and if it was it deployed That's what would happen right so the idea was that that was the big trigger and the change request Control around the same process Jenkins to pull the artifact from artifactory and deploy to the environment and we're done right so To make it more The process streamlined and efficient We went actually went and met with a change management team. So okay, we've we've streamlined deploy But we still this huge process before it And you know we got agreement with them that said hey We're gonna change this process, but we're gonna make it easier for you too But we really kind of need a little bit of help from you to make this even easier So we had this huge long form in service now of Where are your dependencies? Where are all these things that the developer needs to fill out? We said if we can change this form to make it simple, you know, we can get your all the information you need we can standardize the deployments and You know nobody touches the system anymore and they loved it because it reduces risks That's the main reason you have change management processes. You got to reduce risks to your system. So they loved it we essentially ended up with a Code deployment time going from hours to a couple minutes right because depending on how long it takes you to You know log in and actually deploy the code But also the change request process went from I would say 40 questions down four and one approval Right. So the time it took you to stage your request and the time to took to deploy it was quicker than you know One of the one of those calls previously So so we found this process. We got agreement and the devs loved it and said well, we could we could deploy constantly now The one thing we didn't touch was testing because they go We can get you a certain way, but everything's manual will help you guys will help QA going forward All right, so That's what the process was for one team. Now if you look at multiple teams Just add more Jenkins servers. All right, the same thing applies here, but they're going to use the same gating functions They're going to use the same repository the same artifact management system in the same portal everything else was going to be very distributed All right, so you have a Jenkins server per environment per team Right and you can see kind of see how this is growing when you have more than You know two teams on there and then you know Jenkins starts kind of invading and you know kind of taking over at this point Right, so the title of the talk is 37 Jenkins servers. I'm sure we ended up with way more But the idea is that it's an absurd number if you think about it in the context of what Jenkins should be doing But if you look at the context of 2000 employees in IT That's a lot of people to use that use what you're actually putting out there and how are you actually going to make it work All right, so so we got reception. We got some interesting reception from internal stakeholders You know we heard a few similar Responses the biggest one was why why are you even doing this? This is crazy Why why is it so complex? Why do you have so many servers? Like I don't understand it But usually when you're talking to a developer and you just give them hey you have access to the dev server You don't need it over here, but you have everything you need they stop caring. So I get what I need I can and I can move forward right so that the idea is that devs wanted to be a control and they didn't want to wait They got to do their work right, but the platform and team and anybody else in ops Really needed them that the devs to adopt standards All right, so so there was a carrot that we said in front of them and it was the illusion of control, right? You could have your own server, but you got to follow these standards to get there first I So that actually worked pretty well The other part of the why explained is all the other folks are you know we come to infrastructure and ask for a server They they're why do you need another one right? Why is your architecture so so so large? So that for the non devs are the curious parties and IT Went through the points of why we chose a distributed architecture and made fairly easy sense You know those terms they all go through the architectures The first one is flexibility like as a platform team who doesn't know Jenkins We needed flexibility without affecting our end users All right, so again besides one guy that we actually hired in that new Jenkins nobody knew how to work Jenkins We knew how to architect things you know how to implement and support things, but we didn't understand Jenkins itself, right? So We also didn't have a lot of time So how are we gonna figure out? Let's just go the easiest way to get this out in production and figure out what our metrics are How do we tune this later? So we wanted the flexibility to be able to change our minds or have my our mind changes for us by you know Our managers to go and say hey Maybe we want bamboo next time. I don't know maybe we've bought into Atlassian We're going to switch to ambo. Well at the end of the day if you do that with this architecture Nobody knows Nobody knows what behind the scenes nobody knows what's behind the curtain and the devs can just keep moving There's no change to the flow of products going to production, right? Our users will be none the wiser All right, the second reason why we need resiliency. So going back to those devs They never built their code before in any environment outside of dev You know, they're gonna be playing around they're gonna be configuring their jobs and then somebody's gonna break something Right if you look at that from a from the cultural aspect of dev ops It's good to break things But you also don't want to impede everybody in the company, especially if it's you know right before the third shifts in the first Shift comes back on right so when there's an issue It was isolated to that very team to that very environment Right. So when a server was rendered out of commission You know it was on to that team to go say hey, let's go fix it. Maybe we should probably not do that next time Right. So when this occurred the platform team usually got a call. Hey my version of ants not working or whatever Instead of instead of spending hours trying to figure out what was wrong All we do to save the logs and reprovision a new server now There was a parallel automation effort underway That the platform team was working on to automate a virtual machine Right so include to creating a virtual machine getting an IP Installing components and in this case Jenkins was the platform that was being installed and provide automatic access to the requesting team So it's one of the first server blueprints that we put together For teams to consume was this Jenkins depth server, right? So you can just log into service now click the button five minutes later. You get a server ready to go and Then all you had to do is reprovision Or sorry decommissioned the server and it's act like nothing ever happened You know that was the one of the reasons why we are able to do that and provision as many servers as we needed right away The other thing we put together and why is we also put an adoption accelerator out so we put together an onboarding kit It focused on what a race he is so we're changing their process we got to tell them who's responsible for what now We actually put together some base scripts for Tipco bills and tipco de ploys as well as net building the ploys and say hey Here's your jobs just use these if you don't want to use these you can write your own or you can modify these net features You want into them? You know so that was something we also provided we also provided self-service if you wanted to go do this yourself Here's everything you need here's your instructions or do you want us to handhold you all the way through right? Both parties like that and we also then then gave them a point of contact to just call us for help right so it was very easy for them to choose their own adventure per se and To be able to take advantage of the new process right so all of these kind of ideas are really for focusing around How do I reduce the barriers to entry and also make it a sustainable system going forward? All right, so the last one of the last pieces that we we had as a Y is Supportability right so I'm on a platform team all the people reported to me, but now we have all these servers to manage well We we at this point ate our own dog food and we actually provisioned another Jenkins server That was a maintenance server. So what we did is we we just to you know Put some symbol metrics up down plug-in versions sometimes because we want to know what people were using And it sent a daily report out. So this is said this one's down or someone's good So we were proactive in understanding what that what was going on in the environment We didn't have to contact any of the team to go talk what's in what's in Splunk What are your alerts doing none of that because we could use our own server right? And you know in this way we're proactive to say hey your server is down or it's been down for this long Can do you need help so it was a little bit more of a and user Perspective of hey everything's everything's good. Let me help you more if you need it and The biggest thing we did around support ability I have to mention it is 2013 so before Jenkins 2.0 So we had 1.5. I think was out in production The the management of those Jenkins jobs was something that we also managed so anytime the job needed to be updated Our team was responsible for that and figuring out where that was could have been a pain Right. So what we did Is when we built a Jenkins server we built it with an image of a specific job and the job Lovely pseudocode here really just says To log into a code repository and a repo name go pull this script with a certain label And then download the scripted Lincoln logo Jenkins server and execute. That's all it did From there That script was modifiable by devs and source control. So again, they never had to go to Jenkins We never had to log into Jenkins to go change the script and update the version. We just change the label So in that case we will our team never really had to log on the server at all, right? So if the developer wanted to change it all you do is check out a source control make some changes check it back in Let us know and it real quickly change the label and we're good. We're good So that also helped when they broke the server We didn't lose anything doesn't matter. Nothing was there So that was the biggest thing that we did. We really didn't have to manage our own environment because we made it very simple for us All right, so those are kind of the big wise and kind of what we did from our architecture perspective So so mainly the lessons learned here, there's it's more architecture forward, but it's really around Motivations and how you get your devs or anybody else to adopt technologies that they don't know and you don't know yourself With low barriers to entry I won't read all of these but I will say is when you're thinking about architecture Software microservices wherever be prepared for change be prepared for different owners of the the tool set different changes to the underlying tool set And abstraction layers are your friend right because that's one way to help you solve and prepare yourself for change And that's it. Great talk. Thank you. Did you hit any issues? Were those teams all working on isolated features? Or did you hit any issues with them like integrating down the line? Yes So we did they were not isolated this is old legacy code base, right? The biggest thing I want to point out here is I kind of didn't read it out, but it's important is Does the branching strategy for each team really matter right and I don't want to point fingers But at the time when you took it on this approach, we actually started there We're like how are we gonna get the changes to merge? How's this all gonna work? They were actually on one code base and nobody branched Yes, it's very scary, right and we actually started the conversation there. It said You know, we got we got to figure out good branching strategy. Should we do merge semester? Should we do developing on master? How do we do this and the concepts of? Explaining branching and tagging to our build engineering team that was doing the builds was a month of our time Right and and the comment that we said well, let's just label everything. I don't care. I could branch This is not a fight I care about the devs should be able to figure this out and if you can't build your own code and let's that's another problem, right? and the comment we got back was I Don't know why we're talking about tags tags came out of the blogging world Right, so that's I mean we did Right and one of the things we recommended them do is to consolidate and the teams were associated with a branch The masters they were working on so that's essentially how we solved it But that's kind of where we started we said this is a fight. I don't want to as operations I shouldn't be telling you how to branch and when to branch and when to do a feature like that Just not something I care about and it didn't end the end of the day. It didn't matter because That was not going to get us out the door faster, right? So we decided that was a battle. We weren't going to try to fight Hi, I have a question that's not related to Jenkins so much, but I see that you use Jays from Factory yep Do you have any lesson learning experience about how to deal with conflict when some depth Want to use certain version and they want to put it on at the factory process Because we want to be systems to be saved with a stable version But people say I want this for these certain features and they just release it with our factory itself So, yes, so what we did for every Component of this architecture the platform team owned a feature request intake process it said You find this cool feature tell us we'll integrate it Right, so that's what we did we had we were acting just like a product team So we'll as a customer tell us what you want We'll prioritize it and with all of our work and if enough people want it We'll implement it for you that so that's how we we took it because we did have a lot of that We had a lot of that with the scripts We had a lot of that with Jenkins plugins and we said well Here's our standards if you find something else you need let us know and we'll work it like our own scrum team So that's that's kind of how we entered my question is in terms of The actual servers that you were giving To these developers How was that working where they? Yep, there were individual VMs it was a Windows based Everybody was a mostly Windows company so Windows based Jenkins image With a preset jobs configured and then they got admin access to that via a group group policy All right, thanks