 Welcome everybody. My name is Bob DeRosa and I'm going to talk to you about Broadridge's journey from on prem to the cloud with Kubernetes. So just to give you an idea of what we're going to go through today. I'm just posting an agenda. And basically, you know, I'll start out I'll tell you a little bit about Broadridge. I'll tell you about the project that we completed. We'll go through why we made the switch, how we prepared the process we went through from moving from on prem to the cloud. We'll take a look at what we should have done differently. The results of the project. I'll talk a little bit about what's next and I'll provide you with some references from some of the material I talked about. So first, a little bit about Broadridge. Broadridge is the leading provider of investor communications. We're a global fintech company. We've got over four and a half billion dollars in revenue. We handle millions of trades a day involving trillions of dollars. And when I talk about investor communications, we support communications that reach approximately 75% of the North American households. And investor communications includes things like proxy statements, mailings, if you've got any stocks that you own. A lot of times the material that comes from us, whether it's mail to you or electronic is attributed to Broadridge. You know, we work with a lot of different banks and brokerages, and we're their back end. We also have something where we manage shareholder voting so a proxy vote. If you use that app at all that's again powered by Broadridge. Last year we hosted nearly 2000 virtual shareholder meetings and shareholder meetings are something that's required by the SEC. A lot of times they used to be done in person, you know, whether it's at a small venue for smaller companies or even at larger venues and even hotels, things like that for a larger company. It's where the board will go through their annual meeting, they'll present any information that needs to be presented, and then they'll hold their voting. Well, because of the pandemic obviously this was still a requirement, but people were able to take advantage of a Broadridge service that we've had for years. And like I said, there was over 2000 meetings that were held last year. Now we have over 10,000 employees worldwide. And out of that we've got thousands of technical associates. And if you want more info about Broadridge, you know here's the URL. As far as my background, I started as a software developer, and then I moved on to starting my own company with a couple friends, and I was the CTO of the company. We did educational software and security. We also, that's my first trial where I use the Amazon cloud, and we put some of the products into the cloud. I moved on to Broadridge, and I worked in a role called service delivery. My responsibility there was managing the products that were running in the cloud, we had a couple products that we put into the cloud. So I managed the infrastructure side of things. And then I moved on to the DevOps COE. So we started up a DevOps center of excellence. And I was involved in choosing the tools, putting processes in place, and just kind of getting the word out to the company about DevOps, how it worked. And as part of that COE, we looked at moving our tools from on-prem into the cloud. And so I was responsible for the technical lead on the team that took those tools and moved them from on-prem into AWS. So the project we're going to be talking about today, like I said, we migrated our DevOps tools from on-prem into AWS. We have a couple of different tools. We've got Nexus, we've got GitLab, and we have Jenkins, which is known as Cloud BCI. So today I'm specifically going to talk about Cloud BCI because that we moved into Kubernetes. Now our on-prem CI implementation had around 10 masters and around a thousand agents. Those agents were running on VMs. One of the problems is that because a lot of the masters were shared by different development teams, if one team had a runaway job, it could affect other teams. Another problem was since it was on-prem, if we needed to add resources, it sometimes could take weeks to do that if we needed more disk space or more CPU and stuff like that. Because they were all VMs, they required a lot of time for maintenance and patching. Some of that was done by the DevOps teams, but some of that was done by the development teams, but still everything interacted. And like I said, mostly VMs, but there were some agents that were running in AWS. And we even had some older agents that were running things like Solaris. And these agents were located all around the world. So the next question is why we made the switch. So first of all, why the cloud? We've got a corporate direction that we're moving our applications from on-prem to the cloud. And obvious reasons that you always hear about the cloud, better scalability, better reliability, and the fact that you can use infrastructure as a service. The other question is why Kubernetes? Well, the main reason we moved CloudBCI to Kubernetes is because it was optimized for Kubernetes. The build agents, which I talked about before, they work very well as pods. So, you know, a lot of times when you're building an application, you'll run your build job, it'll do what it needs to do, and then it's going to be quiet for a while. Well, with pods, you can scale them up when you need them, you can run your build, and then when you're done, they can go away. Also, the fact that you can customize images for development teams. When you have these VMs, these physical machines, you've got to install the software that you need beforehand. So if you use pods instead, you can put in different containers that are required within the pod. So if I have one team that's building with certain software and then does testing with other software, I can easily swap in and out different containers in the pods to accomplish that. So I don't have to worry about having everything installed beforehand. Pods are easier to patch and to roll back. You know, if something goes wrong, I can just use a previous version of the pod. And the fact that these things are shared and they scaled up and down, that's going to reduce our infrastructure cost. So in preparing for the project, if you're doing your own project, the first thing I would say is, if you're going to move to Kubernetes, you want to take advantage of that move and rearchitect what you're doing. Now, in our case, since we were using a third party application, you know, we were limited, we were following their architecture. But if you're going to build your own application that's currently on-prem and you're going to move that to the cloud or to Kubernetes, you definitely want to look at that architecture and see where you can make changes. The first piece of advice I would give you, if you don't need to use Kubernetes, then don't use Kubernetes. You know, Kubernetes is a complicated environment. There's a lot to it. And if you have a simple application that maybe just needs to use a few containers or could even go serverless, then you probably should consider that. In my opinion, Kubernetes is not the right environment for doing a lift and shift. And I know there will be vendors that will tell you, you know, we can lift and shift and it's very easy with Kubernetes. But again, it all goes back to if you're going to make this move, then you really want to look at your architecture and you want to see how you can improve it. You want to take the lessons learned in the past and you want to apply them in your new design. There's an expression that a friend of mine always used to say, and it goes something like fast, cheap and good. Pick two, right? But the idea is that you can't do a project fast, cheap and good at the same time. It's just not possible. But you can choose fast and cheap. You can choose fast and good, or you can choose good and cheap. Now, if you're going to be going to Kubernetes, what I would say is pick good. Whether you pick good and fast. And it's going to be a little bit more expensive to get it done. Or you choose good and cheap. And it's going to be a little bit slower. Again, Kubernetes is pretty complex. So if you're going to go that route, choose good. And then one of the other two. All right. When it comes to preparation, there's an expression. Rockets are hard. And if you're following all the commercial space travel that's going on, I think Thursday, there's actually another SpaceX launch. Along the way, there's a lot of mishaps, a lot of testing, a lot of explosions. And a lot of times when that happens, you'll see tweets that'll say, rockets are hard. So what I would add to that is, so is Kubernetes. If you're going to be doing this project, you really need to make sure you have the expertise to do it. And, you know, as a global fintech, we've got, we're fortunate that we've got resources. But, you know, you need to hire the folks with Kubernetes experience. You need to set the proper expectations that it's going to take some time because, again, you want to go back, you want to look at your architecture, you want to do it right. And the best way to go about it, in my opinion, is do build a minimal viable product and test out some of your ideas. And then start small with that, work with a few trusted customers, whether they're internal or external, and then start iterating based on that work. This way you'll get it right. And as part of the process, you want to document what you're doing, you want to review it, and then you want to test it. So, like I said with the MVP, you want to do a quick POC and test your assumptions. And we actually did that early on, we were thinking of using one type of file system. And then after doing some testing and running through it, we realized it wasn't going to work. And so we switched that up and we ended up using something different. So it was good that we did this POC up front before we got too far down the line, and then it would be harder to make changes. What I would say is use the native services whenever possible. So a lot of these cloud providers have native services like storage, like databases. Figure out what cloud provider you're going to use, and then embrace those native services and use them. It's going to help you tremendously when it comes to being able to manage and scale and even do things like DR. You want to plan upfront for disaster recovery. So you don't want to leave that as an afterthought, especially when it comes to your own architecture. On the one hand, it may be interesting to use a database for some of the storage and then maybe use a network file system for other stuff and then maybe SSDs for a third part of your storage. But then when it comes time to do disaster recovery, when it comes time to do backing up, you have to figure out, how am I going to do all that? How am I going to keep it in sync? How long is it going to take to do that? How long is it going to take to restore that? So if you look at disaster recovery upfront, when you make some of your choices, you may choose different things to make sure that it's easier to recover. And again, it all plays into the type of product, the amount of the RTO, the recovery time objective, and the RPO, the recovery point objective. So if you've got a product that you need to have an RPO of five minutes, meaning you can only lose five minutes of data, you may make different choices in your architecture than if you have a product that you just have to back it up once a day and that's fine. Now, keep in mind as part of the cloud, you're following a shared responsibility model, the code and data, the operating system. A lot of times is your responsibility. You know, some of the serverless functions, you don't have to worry about the OS, but still have to worry about the code and the data. And that the cloud is not magic. So because of the shared responsibility model, you still have to make sure that you're doing proper security, you're doing proper monitoring, and you're doing proper backups. You know, no matter what cloud provider you're using, they've got great uptime, but there is definitely times when they're going to have problems. And you're going to want to make sure you plan for that. Now in our case, a couple of products we're using, we're using Aqua for our container security. We're using Datadog when it comes to monitoring our clusters and our containers. And then we're using Castin K10 for backing up our applications. Now, as part of the DevOps team, we obviously follow infrastructure and configuration as code. And what that means is that we're storing our code in version control. And when I say code in this case, it's not the application, but it's the infrastructure. It's the code to build the infrastructure. All that code become part of a CI CD pipeline. So everything is auditable and everything is repeatable. Right. If you're checking in your code, if you then push it to the pipeline, the pipeline will build the code. It'll push it out to your dev environment. You'll do some testing. And if everything works, you can then deploy it on up to your QA environment. And again, since it's all the same code, just different data, you know, if it works in dev, it's going to have a good chance that things will work in QA. If you have to make any adjustments, you do that and then eventually you'll push it up to prod. Now the idea behind this is you want to push your problems to the left. Again, another common thing in DevOps. Meaning that when you have your configuration as code, you work your problems out in your dev environment and you get it right in dev. Once that's done, you check it in, you capture it. And then you go on to QA. The idea is that by finding your problems early in dev, it's going to be quicker to fix them and it's going to be a lot cheaper to fix them. You know, if you have a problem and it makes its way all the way to production, generally what happens is, you know, you've got a fire drill, right? You've got customers that are now impacted. You've got to get all these teams together. You got to figure out a fix. You've got to push the fix quickly, try to test it quickly where if that same problem is found in dev, the developer can pretty much fix it by themselves. They'll just check it in. It'll run some tests again. And, you know, it's a lot easier to handle. Another piece of advice I give is don't reinvent the wheel. Just improve it. Again, because we're a large global fintech, we've got resources where we work with a lot of vendors. And we're able to take advantage of that. You know, where we don't necessarily have certain types of resources, we're able to bring them in as consultants, or even some of the people that we buy products from. They've got a lot of knowledge. For example, working with Cast and they've got a lot of really good Kubernetes people. They understand storage and backup. So some of the issues we were having, we were able to work with them to solve them. And it helped both sides, right? It helped us to get a solution, but then it helped the vendor because they were able to now understand from their customer's perspective what needed to be done. Again, as I said before, you want to capture the best practices as code, right? And not just for your product. So what came out of this is that as we were new to Kubernetes and EKS at Broadridge, we were one of the first projects. We worked with our cloud team, and we ended up building out Terraform modules for EKS. So we basically took best practices. Things like non-routable sitters for the cluster. Things like proper segmentation of the VPC, tagging the right subnets. All those problems that we encountered, we ended up putting all that in a Terraform module. So teams later on were able to not have those problems. They were able to build off of our knowledge and be able to get up more quickly when it came to EKS. Helm charts. Helm charts is a, Helm is a CNCF graduated project. It's at the same level of Kubernetes at this point, and it's considered a package manager for Kubernetes. So if you're not familiar with Helm and you're deploying to Kubernetes, you really want to become familiar with Helm. It'll become your new best friend when it comes to Kubernetes. Now as a package manager, what happens is, you know, you might think of installing software. You know, if you're into Linux, RPMs, or packages with Windows, there's a Windows installer, the MSI file. Well, with Kubernetes, it's very similar. Normally, when you're deploying to Kubernetes, you've got this whole collection of YAML files, and they describe the services, the pods, the ingress, you know, pretty much everything for your infrastructure. And what Helm does is it takes all those different YAML files, it packages them all together. It takes them out where you've got your logic in templates, and you've got your values, your data in YAML files. And so again, it lets you deploy across different SDLCs using the same code. So the logic remains the same, and then you can customize the values depending on the environment you're deploying in. The other great thing about Helm is a lot of third parties publish Helm charts. So, for example, CloudBees published a Helm chart for their CI product. So we were able to use their installer without having to figure out how to install it in Kubernetes ourselves. There's a lot of good open source charts available, things like Nginx ingress. There were different secret managers, DNS, all that good stuff. Even Kasten had a Helm chart, Datadog had a Helm chart, so we were able to take advantage of all that. You know, we didn't reinvent the wheel, we didn't write those ourselves. We just used the third party charts. We obviously reviewed them, made sure that they did what we needed to, but then we were able to take advantage of that more quickly. So a couple of things, if we look at what we should have done more differently, or what we should have done differently, I should say. One thing I would say is we should have hired talent faster. So when we initially started with Kubernetes, you know, we did some training. We took some online courses, a bunch of reading and Googling. But if we were to do it again, I would say we should have brought the people in with the knowledge sooner. And that would have helped the project along faster. The other thing I would say is our MVP, we should have defined less features upfront, you know, created a smaller MVP and got people using it sooner. So, you know, one of the things we wrestled with was, okay, if we give this to people now and they start using it, but we don't have everything in place, are we going to get caught, you know, with a problem, and then not be able to service them properly? And so we put in more features and try to do more things where if I were to do this again, what I would say is, you know, we'll put the basic features in upfront. We'll work with some of those customers and basically tell them, look, it's not ready for production, we really need to do some basic stuff with it. You know, some stuff that maybe isn't as time sensitive or critical to you, because we really need the feedback. And, you know, that ties into failing faster sooner. So by getting that feedback sooner in the cycle, by making the mistakes and making changes sooner, you're not as far down the path with some of those decisions. I look at some of the results of this project and the project is still ongoing as far as Cloud BCI is concerned. We've got to the point where it's deployed and since we had such a large implementation, on-prem implementation, you know, we're looking at the best way to bring people over. But if I look at some of the current results, you know, first of all, like I said, we were able to build that reusable EKS Terraform module, and that's really increased productivity for a lot of the other teams. You know, we were the pioneers, we mapped out the course, and then other people are able to benefit from that. We also built a bunch of reusable Helm charts. And, you know, we needed monitoring, we needed backup, we needed DNS. We built out those charts and as we were building them, we built them in such a way that they were reusable, so other teams are now pretty much, they can drop them right in and get going. We wrapped third-party Helm charts. So, my background, I was working on Chef before working on this project. And so there's an idea in Chef about wrapping third-party cookbooks. And basically, what that means is that you're going to take a cookbook somebody else wrote, you're going to not change it, but you're going to write your code around it and include it in your code. And it makes it easy to maintain. So, I have my logic and my data to my cookbook. I include the other cookbook. And then if an update comes out for the other cookbook, I can just move that in and replace the old one. And then, you know, I might have to update some of my code, but for the most part, it's good to go. If you don't wrap something, if you try to modify somebody else's code, and then there's an update, then now you've got to port all your changes that you made to the new cookbook. So, we took that same technique and we wrapped third-party Helm charts. So, again, when Nginx Ingress comes out with a new chart, we can just drop their new chart, their new version into our chart. And, you know, maybe we have to update some values or things like that, but we don't have to port any patches or anything like that. That's a good technique to use. Along those same lines, we developed some Jenkins code that will monitor all those third-party Helm charts. And whenever there's an update, it'll pull those updates down and put them on our internal Helm repository. And so the goal here is that, you know, we will know, hey, if there's a security fix or new features, we'll know sooner on that it's available and we'll be able to use it. Finally, you know, one of the biggest things that came out of this was we had a requirement to deploy CloudBee CD, which is, you know, the continuous delivery portion of DevOps. And that was a new project. We hadn't used that tool before. So, because of the fact that we already had this Kubernetes cluster up and running, because of the fact that CloudBee uses Helm, even for CloudBee CD, we were able to deploy that product very quickly, really in a matter of days. And so, again, you know, that was a big win, you know, that really showed that, yeah, once this environment is all set up, you can move other stuff into it quickly. So I might ask, what's next? Well, like I said before, you know, we're migrating our internal customers to the new platform, and that's ongoing. And that's going to take a while. Scaling, you know, scaling is still something we're going to have to work out as we move more and more of these customers onto this platform. There's different ways to scale things. So we're still working on the best approach to that. Building out third party images. So again, like I said before, when you've got these build agents, you can build images that are used in your pods. And, you know, we're kind of talking to the customers, understanding their needs, and figuring out what those images are. And just build them and have them available to the customers. Automating the testing, you know, CI CD pipeline, it's great if you can build stuff automatically. But the goal is not only building it, but being able to automatically test it, and then automatically deploy it. So, you know, we've gotten good at the build part. Now we're working on the automated testing part. And then we can get to automatically deploying things. We've got other groups adopting our code. So as I said before, because of the way we built some of this, we kept that in mind. And it's making other people's lives easier. There were people who were part of my team, the team itself. We moved over three other applications, there was, you know, at times the team was anywhere from around three to eight people. But what would happen as we completed a project, some of the people may have stayed and worked on the next tool, and then some of the people may have gone and worked on other projects. But the great thing is all the skills that they learned throughout this process. The ones who moved on to different teams were able to take these techniques, take the skills they learn, and they were able to start to seed some of these other teams and teach them the proper techniques. So that's something to keep in mind right as you're building your team as you bring in new technologies. When you start out with that small group, you want to bring people on who can then help spread the word, right? You want to be able to take your techniques, take the right way of doing things, and push that out company-wide. And then finally, because of this new approach with containers, we're really using this as a catalyst to rethink how our DevOps pipelines work. So again, instead of having these monolithic agents, we're able to string together different containers, we're able to swap in and swap out different types of testing techniques, different type of build techniques. So it's really helping us rethink how all these pipelines are going to work. So that's really it as far as my presentation. I've talked about some different technologies, and so here's just a reference. Again, Helm is really the package manager for Kubernetes that we're using. Terraform, if you're familiar, if you're not familiar with Terraform, Terraform allows you to deploy infrastructure to develop infrastructure as code. And it's used, you know, in our case, we use it with AWS, but they've got different modules for other cloud providers. They even have some Helm and Kubernetes modules. So it gives you a lot of flexibility. EKS, Amazon EKS, that's Amazon's managed Kubernetes service. CloudBees is what I talked about as far as CloudBees CI and CD, the two products that we're using for continuous delivery, continuous integration. Kastim was the backup project that I talked about. Aqua Security is what we use for container security, and Datadog is used for monitoring. So that's really the end of my presentation. Thank you all for listening and I appreciate your time. Take care.