 Thank you very much As she said my name is John Lolligan. I work with my colleague here, Raghavendra Tauror. We work in storage at Red Hat I've been working on this project for about two years And so one of the main issues that I've been dealing with that during that time is stabilizing and making the software that we work on more Robust That's why I have my alternate title here Teaching a stateful application how to better survive in rough turf. I don't think this mic is picking me up at all Is it? Okay So we have an alternate title, which is a little bit more accurate but has fewer buzzwords Here is my graphical representation of our system when things go wrong You've got our poor application not really surviving well in the hostile environment and hopefully After this you'll have learned a little bit from our experiences There with me it's a small crowd, so I hope you don't mind Okay So I've been working on storage for about ten years Working on this project for about two This this talk is mainly about dealing with a stateful application running, you know running excuse me under Kubernetes or Openshift I know I'm going to switch Okay, that's a little better. Sorry about that Okay So specifically the project that we've been working on is called Hiketti. It's a bridge between the storage back end and the front ends As I said, it's a stateful application and the And hopefully you'll be learning a little bit about how we made the software a bit more robust Quick introduction to Hiketti itself Hiketti is written in Go It has a front end with a rest style API The API is accessed by Go clients or Python clients On the back end Hiketti is actually reaching out and configuring storage services These are either running natively inside of Kubernetes itself or on dedicated storage nodes And Hiketti has to control them via commands running over SSH The the system was originally designed for use managing The cluster system under OpenStack That was its very early origins and it quickly became adapted to being used for OpenShift The system was originally designed for use of a single node And it didn't really have any sort of built-in HA capabilities if the server went down It stayed down So we've dealt with a lot of different engagements both on the community and many on the commercial side a Lot of what we did is based directly on those experiences So again, I'm kind of repeating myself But one of the hopefully the takeaway here isn't necessarily what exactly we did to Hiketti But lessons that may be applicable to other applications as well So what's so unusual about running a service like this under Kubernetes? Well, it turns out about a few months after starting working on the project. I said to a co-worker Well, you know Seeing how this environment works. It's a lot like what you'd see on a more traditional system only compressed down You know running for a week in this system is like running a year in your typical storage data center The important takeaways are that the environment is very dynamic you have nodes coming up and down the services are expected to move on their own and There are certain aspects to Kubernetes that you can see harken back to its origin as a System for management mainly for managing stateless microservices There's some complexity around how the networking in and that The storage server wants to use what we call host networking But a lot of the applications are more oriented around the actual native networking inside of the system And that can add some additional complexity as well And last probably the most important is the user expectation users really expect to have Very automated deployments, whereas in the traditional data center with storage, you know You'll be you know talking to your coworker or making a ticket in a ticketing system Versus the Kubernetes PV and PVC mechanism where the application developer is actually asking the system for storage itself People just expect this to work So early on Because of the simplicity of the service, it was fairly easy to get it running in Korea's It was very easy to convert to a pod. It's a single binary it has Logging to standard out already it didn't have any complex demonization It just forked and ran Or the parent forked it and ran it. So it was very easy to get it running as a you know containerized service One of the nice things about doing that is it gave the system some fairly simple ha properties right off the bat The database that the kitty system uses was placed directly on a volume managed by Gluster and Because Gluster is a network file system the database could move So if the kitty pod or the node it was running on went down the system could simply run on another node Unfortunately, it's not multi master because the database format is very simplistic, but it at least Has some basic ha capability And then finally one of the interesting aspects of the system is that as it manages Sorry, Gluster pods containerized Gluster. It uses Kubernetes own native Command execution framework. This is the same thing that you end up using if you run a cube tutorial exec Okay, so now getting more into the meat of it When we kind of started looking at some of the reliability issues on the system We started analyzing what was there. We quickly realized that what the system was trying to do was use language mechanisms that are there for error handling, but That don't really Work outside the scope of a single execution of the process specifically I'm talking about the defer mechanism in going So what the system was doing is expecting that you know if there was an error handling condition X That the defer statement would be able to revert that and clean it up The problem is in this very dynamic environment. The application could crash could be evicted Whatever and so the process was being terminated at points along the execution chain so it might be creating some LVNLVs or Gluster bricks the next thing you know You've been deleted or evicted and then you come back up the state of the system is no longer consistent So one of the things we realized is that we had to stop relying just on the mechanisms that the language Provided us and actually work on some of the design such that the system would be able to survive Being terminated come back up and do sensible things So this brings us to what we ended up calling the operations layer you could call it all sorts of different things I've seen other tools do different similar things But ultimately the point is to bake reliability into the design So what we chose was kind of a Record what you do before you do it approach This design was a lot that was created in order to allow us to roll back anything that we had started at any point So no matter where you crash along the way or get terminated Along the way the system will be able to come back up see what its state was and then undo that One of the things we had thought about is that perhaps we could add resume ability. We haven't done that Some of the operations aren't naturally resumable, but once that we're could have considered doing that So one of the aspects of this system was that we designed it such that the The state that was recorded into the database would allow us to analyze this I'm repeating myself. Sorry about that I Anyway long story short The initial version of the system had Creation and rollback, but what we really wanted was you know a fully robust approach that would allow us to do anything After crash Unfortunately, we had a lot of other deliverables at the same time So what we ended up doing is making sure that the design covered what we wanted to do But that we would have to come back and implement the cleanup later In the meantime As we were working on other features we had put the operations framework out into the field we Had to build some stop-cap tools this ended actually ended up being a really good experience for us Because we were building tools that were very generic simple to use That over time we were able to you know share with other teams and it helped us Actually implement the cleanup stuff when we eventually got around to doing it So we had this framework in place. We had this metadata in our database and Then the tools that were external to the process could help us clean things up in the meantime and some semi manual approach and Then we were able to work on the fully automatic cleanup over time One of the things that kept us from doing everything right away is that the system had a fairly nice test framework But it was really only testing the happy path So we had to take a side route and spend a good amount of time developing an error Error testing framework and this included building error injection into the system itself So by setting up the configuration in a certain way We could actually induce errors at any particular step along the chain of actions The system has to perform to set up the storage and then at that point We could test the various failure scenarios that the cleanup code was supposed to handle And then eventually about I want to say six months after we had shipped the first version with operations We had developed our cleanup code and were able to provide that to our users and our customers As part of the experience working on these Problems we learned that one of the other important aspects to keeping the system reliable was to build Good robust diagnostic tools tools that help the user get to the root of the problem as fast as possible You know, we wanted to keep the tools simple and evolve them as we worked on cases to You know, we want we were learning along the way and building tools You know based on our experiences So a lot of stuff that did get built in the server is very useful. It's in the field now But still some of the tools that we had built externally. We're still using fairly frequently One of the ones that I like to use or use it fairly frequently is a tool that allows me to compare The state of the Hiketti database along with the state of the cluster system and Kubernetes itself It will show us any discrepancies and we can use that to either debug or Even fix the problem sometime Lastly, I want to mention that we have built in some metrics into the system This is useful for both admins who are You know trying to monitor the state of the system, but by building in metrics around the operations themselves You can have an idea of the overall health of the system if if the operations are failing They're not getting automatically cleaned up. That's a time for actually a human to intervene versus You know, the automatic cleanup will just work in a while Okay, so now as I was kind of joking at the bottom Here is my do as I say and not as I do It turns out that there are many things I would love to do with the system and that are generally a good idea But we don't we're not fully there yet So I just want to talk about this briefly One of the issues with what we're doing in the system is that we have duplicate state We have state both in Gluster and in the Hiketti database itself Ideally what we'd be doing is minimizing that taking away as much unique state from Hiketti Now that could also lead to some performance problems So the other aspect is to use the state in the database more as a cache And we've done that a little bit by adding a device recent command So Hiketti needs to know about the sizes of the devices and the amount of things being stored on To make allocation decisions However, if something is changed on the system by the admin or if there is a bug or something we have a tool that allows us to Invalidate what's in the DB and replace that with what's live on the system. So we get the benefit of actually having fast local data but having the ability to Reconcile that with what's actually on the system It would be nice if we were actually able to do that more for some of the other storage objects Something we may or may not be able to do in the short term, but it would be nice Okay, so here's my summary slide So Long story short One of the issues we encountered was that the code was written in a way to just try and naturally take care of itself But that kind of organic growth doesn't really pay off So nothing beats design for making software reliable. You've got a build it in Early as you can and And if you have to build it on later like we did It takes it's worthwhile taking some thought into it and making it so that like we did with our cleanup code You can implement a lot of it The core data structures and then do the other parts later on It's important to be able to track what you need versus what you don't And you know learning from our errors was very important And that's it for the talk Thank you everybody for coming if you have any questions It's a small crowd and I know the party's coming up and that bike troubles the whole time so Appreciate if you have any questions, but That's it for me So I was wondering about your operations layer that sounds like a kind of journaling based solution Did you guys use a replicated state machine for that? Or do you care about the high availability of that log or worry about that or how do you worry about like? Yeah, what if the log gets corrupted basically it's my kind of what that actually does happen So one of the issues is again from the evolution of the system that we have It's called Bolt DB. It's a native Golang database Unfortunately It does have some drawbacks and that's a single file database if it goes away You're kind of toast and that goes back to what I was saying about the cash Ideally we'd be able to derive all that information from the system Unfortunately, there are some unique pieces of data that are only kept in the Hiketti DB at this time We've done a little work in the meantime trying to store more metadata in Gluster itself Unfortunately, Gluster wasn't designed to store arbitrary metadata on all the volumes. So it's a trade-off One of the things about the framework that we also tried to do that I meant to mention that I must I think I skipped was that We're also trying to retain backwards compatibility with the existing systems out there in the field So we didn't want to disrupt our users very much. We're a very small team So it was you know, we could have gone off for two years and tried to redesign everything to use at CD or whatever You know, there are times where I wish we had done that, but It paid off for a lot of our users in the short term Okay, very cool. Also glad to hear about malt TV. That's cool, Sue