 Good afternoon everybody. My name is Mark Deneve and this is my coworker Kyle Button. We both work for a company called Paychex. We've both worked for Paychex for anywhere between two and three years. I have over 20 years of IT experience. Kyle has about five years of IT experience mostly as a developer for me mostly in operations. We both work for a team called Infrastructure Platforms. Infrastructure Platforms at Paychex is responsible for all of our storage, server, virtualization, infrastructure as a service, platform as a service for both internal and external use. So you can kind of say that we're the ops side of the DevOps at Paychex. Just really quick, I'm kind of curious. I think somebody else asked this earlier, but how many people here are from the operation side of things? Okay. And how many people from the dev side of things? What we're going to be talking about is more focused on ops than on dev, but definitely we'll be talking a little bit about both. So just real quick, what is Paychex? I'm going to read this one exactly. We are the leading provider of integrated human capital management solutions for payroll, HR, retirement and insurance services. We have over 605,000 payroll clients and we pay one out of every 12 American private sector employees. So a little bit about Paychex and our use of OpenShift. We've been using OpenShift since about 2015. When Red Hat shifted over to 3.0 with the Kubernetes backend in 2016, we went all in on OpenShift. It's taken on a very viral adoption inside of Paychex. It is our fastest growing infrastructure platform. We have 16 different clusters to cover dev test and production across three different data centers in two different regions. We've gone through seven in place upgrades from OpenShift 3.0 all the way through 3.7 over the past year and a half, two years with zero application downtime during those upgrades. And the applications hosted inside of OpenShift are responsible for moving well over 500 billion dollars per year. So what are we going to talk about today? We're not going to talk about putting our business applications into OpenShift. What we want to talk about today is putting our operational tools, those tools that make the backend, the infrastructure work and help us manage the infrastructure. OpenShift isn't just for business applications and services. By doing, we're going to talk about how we improve services by running our infrastructure tools in OpenShift and how running the infrastructure tools in OpenShift helped us better understand the OpenShift platform. So to give you an idea, before we moved into OpenShift for IT operations, we had a lot of deployed VMs. We had a lot of different multiple OSs running back end. We were using various automation tools to manage all these different applications and we were lacking a mature CI process. The IT ops organization didn't really understand the potential benefits of moving into OpenShift. We got a lot of the, it's just another tool. I'll wait this one out. Isn't this a dev thing? We also underestimated the importance and criticality of running our tools in a highly available platform. This tool has never gone down before. This is way overkill or we'll just rebuild it if it goes down where many of the different things that we would think and say. What we needed to do is we needed to do better. We needed to be able to help development teams with questions and issues with the OpenShift platform and we weren't able to do that because of a lack of understanding of the development side of things. We needed to move outside of our comfort zone. We needed to gain end user experience with the platform. We needed to become more dev like in our thinking. We needed to understand not just how to deploy the platform, but how to leverage the platform by putting our own tools in it. What we're going to talk about now is how our team went from a bunch of unknown magician engineers that performed very much unknown magic work to a 1-800 dial in engineer for our developers. We went from nobody's to somebody's really, really quick. To talk about the application, the first application that we did from an operations perspective, I'm going to let Kyle tell you a little bit more about that. So we had an application called Frye and Frye managed our Nazes and our sands. It monitored them and it was a giant mess of PHP, Python, MySQL, a whole bunch of bash scripts. The code was about four years old, not very well maintained, but it worked. It ran in one data center. There was no fall over, no business continuity available. If something happened to the server, we would lose a significant amount of insight into our storage platform. And we called it Frye because it managed the tape robots we had, Bender and Flexo, why they called it Frye since he can't even manage himself, I don't know. So when we decided to do this, there was a lot of skepticism and unsure thoughts from our team. This was a really critical application and they were really scared about uplifting everything. So we came up with a set of goals. So we needed to change the thinking. We're not a team of developers. We're all in operations. But we needed to start thinking more like them. The other thing I'll mention is hiring a dev to work on your team, such as Kyle, makes a big difference as well. So this application needed a flexible framework that's very well documented. The framework is easily expandable. We needed to have a very resilient architecture. It has to run in multiple geographic locations actively and we wanted to have multiple instances per data center. We'd like to get it continuously deployable. All our requirements about monitoring and we're constantly bringing in new tools. We need to quickly make a change to this application, this tool and deploy it. And we'd like to get so that we could deploy it in minutes based on business needs. So we implemented this using Python in Django with a MySQL backend with replication using S3 storage for file persistence when we needed it. And we used a message queuing system for tasks. The tool like this, as long as certain automation tasks got done with our infrastructure, it didn't really matter when they happened. And so how did it go? It became so easy to make a change and deploy it out into production, our senior manager was able to do it within a matter of minutes. And he doesn't have a very technical background. We had no more single points of failure and there was no more application downtime when we had to make deployments. We are constantly monitoring our SAN and our NAS infrastructures now. There's zero gaps in between when we make a deployment so we're never missing out on any data. It's running active, active, active in those three data centers crossed over two regions. And we had issues before where if too much load was going to the application, it would crash. So if too many alerts were being generated or too many people needed to access it, we've got auto scaling set up. So OpenShift will just spin up more pods for us. And we can easily add new components if we have to. If we need new functionality or new services integrated with this tool, we can just create new pods, add new modules in and expand the code base very easily and fluidly. But what we did learn was that debugging an OpenShift can be difficult. For this application, we intentionally sacrificed some traceability for the ease of deployment or development. We're not, as I said, a team of developers. So having to include additional packages was kind of hard, especially when you're trying to trace across three data centers with 12 pods. And all our logging is ephemeral. When the pods die, the logs disappear. So when we do have issues in our production environment, it's kind of hard to find them. But with OpenShift, if the pod dies, we don't really care. Eventually we can get to fixing that bug because usually it's not super critical. We're a lot better at explaining OpenShift concepts. And this helps us out immensely when working with the development teams. When they have to deal with all the different configs and deployment types, we know how quotas really work. We had run into some issues originally with doing deployments and they would fill up all the quota space and the pods would never get created. We learned all the intricacies of services and routes. Having our own application in the platform that we manage allowed us to also troubleshoot issues and find problems with the platform before the development teams and the business applications actually had problems. We've seen this where we were able to find HA proxy issues and correct them before businesses saw them. We've found and corrected issues with scaling and issues with pod deployments. We found a bug within the OpenShift internal registry that we had and we were able to correct that before any development teams or business services were affected. A few other things that we also learned. We got really good at finding actual bugs in the platform itself. We would start testing some of the new features well before they were available for developers. Scheduled jobs was a great example of that. I wanted to start using it because I saw a need for it for the developers and we very quickly came to the concept of we don't use alpha features because of not being quite ready for them yet. Additionally, overall we were able to offer much better service to the development teams. We became an influencer of the software development lifecycle and CI process and we created a much tighter integration and trust between our dev and our ops teams from this process. That's why it says be prepared. If you were to do something like this, be prepared because if anybody here has read the Phoenix project, the concept of Brent doesn't scale well. You need to make sure that you disseminate your knowledge to prevent information silos. Create a community of practice which is what we did in order to make sure that not only is the operations team helping to support OpenShift, but the developers also get involved and are helping to support OpenShift as well, answering each other's questions. This went so well. Other operations teams also started to want to get on board. We started bringing in third party integrations. We brought on our chat as a service. We've moved a lot of our monitoring tools, Grafana, Prometheus exporters, things of that nature into OpenShift as well. All these are running in multiple data centers managed by OpenShift, making it much easier for us to worry about the day-to-day operations and not trying to keep these applications up and running. We've also moved a lot of deployment automation and infrastructure automation into OpenShift as well. What's next for Paychex? What we're looking at next is expanding the use of S2I for development. We've seen a lot of good use case from our own use of that. We're looking to try and move more of development to use S2I. We're looking at functions as a service. Functions as a service would fit great for operations needs, things like web hooks, callbacks, things of that nature. Working with the service catalog integration for on-prem resources and also looking to move things like Jenkins, Workers and OpenShift. The other thing I'll say that's not on here is the operators that we heard about earlier today, something else that we're very much interested in trying to start moving into OpenShift to be able to manage the applications that we have there. So in conclusion, hosting our tools inside OpenShift helped us to build team experience with the OpenShift platform. The features of OpenShift helped us make our IT operations teams better. It may able to manage workloads, improve efficiencies and provide better customer experience to your internal or our internal customers. A great way of looking at this is if you really want to understand the customer experience be a customer of your own platform. Eat your own dog food or a better way of saying that I think is probably drink your own champagne because it sounds much more high class. It's a great way to learn about SDLC and SDLC concepts, CI, CD. And a really good quote is don't be a cost center, don't be another admin, just enable dev and speak their language. And that's what we've been able to do by moving our own tools into OpenShift so that we can really understand the platform. So we made sure to leave a little bit of time for questions if there are any. There were a lot of ops hands up there and these wonderful ops have taken on and given us a great talk. This question halfway down there. You guys rock. What's that? That was rocking. Thank you. Any tips or suggestions or best practices on how to effectively do in place upgrades? So the in place upgrades. We actually did a lot of work using our own Ansible automation. We have a lot of what we refer to as paycheck systems for our environments additional outside tools and things like that. So we actually wrote our own automation building on the automation that we got from Red Hat. But Ansible playbooks and things of that nature to really handle the tasks to roll through those upgrades one by one. Yeah, and actually state come to the OpenShift roadmap session. We're going to be talking about how we're evolving the current installer which is OpenShift on RAL using the Ansible installer and then that will be like the fully immutable installer which is OpenShift on the immutable OS and some additional automation that we're going to be introducing around that. So there are questions? There's a question way up here too in the middle. Yeah, are you willing to share the tool and that you guys have written or is it a proprietary? Sorry to put you on the spot. I'm sorry I couldn't understand. Yeah, the tool that you guys have written, the custom tool, are you guys willing to share the source code or make it an open source? Open source, the fry tool? Is that what you were asking about? Yeah, the tool. I'm not sure. We've probably not. There's a lot of proprietary code around our own processes that's been baked into it. For sure there's some components that we might be able to, for example, a lot of integrations with Cisco MDS work that we do. There's not a lot of tools to work with Cisco MDS and we've had to build a lot of that in-house. So that might be an opportunity for us to open source. There's another one right here. Okay, we'll go. I got a question for you. When you talked about SDLC and you became an influencer, how did they feel about that, the existing team? Actually, I've had a lot of experience with it. The development environment, or the developers I've worked with have been actually very happy with it, because quite honestly it did go both ways. I'm not a developer personally, so I was able to learn a lot from the development teams and development organizations on how to use OpenShift as well as us helping to influence them as well. So it's really worked out very well. You might have hinted at you mentioned MDS, but how are you providing persistent storage to users and are you providing dynamic options for them? We are not providing persistent storage at this point in time. We've actually architected in such a way that we don't require persistent storage. The only exception to that would be S3 or an object-based store that we use for shuttling things off that really need long-term storage. Outside of that we actually don't do persistent storage. You mentioned active, active, active. Are you running across three data centers with your OpenShift and how do you have it? Do you have it stretched or do you have multiple instances? We have multiple instances. It runs stretched at the application layer, so not at the OpenShift. Yeah, to be clear we have three different instances of OpenShift, each running the same set of code and then they're sharing a database back on where native. Back on the storage question. There are also some couple of sessions this week around what's called container native storage, which is some work that the Red Hat storage team is doing specifically around storage and OpenShift in addition to the work that we're doing in Kubernetes to support all forms. So if you're interested in that check that out. Was there one more question? Were you able to deploy all your operations tools in OpenShift or were there tools that you have for infrastructure that were not able to be ported into OpenShift for whatever reasons? Did you hear that? I think and I'm just going to repeat it real quick just to make sure that I understood the question was it was there any tools that we had we found that we couldn't put into OpenShift and what did we do? Was that the question? It was a little quiet. Yes, specifically the operations tools that that you guys uh yeah. From from basically are your are your operations tools completely in OpenShift? No, it's a process so there's several operations teams and they can choose to go into OpenShift or not. Most of them right now are still running on traditional infrastructure. The one thing that we are finding is that more and more of those teams are wanting to go into OpenShift because it breaks down a lot of barriers and makes it very easy for them to do their own management and deployment of the code as opposed to having to go talk to different teams to say spin up a new VM or to make changes to those VMs. They're able to do it all themselves because they're running it all inside of OpenShift. All righty then. Thank you very much for that wonderful perspective and insights.