 and I actually come from Australia and that's Australia with the kangaroos not Austria without kangaroos and when I left it was 43 Celsius on the day I left so you can think you can sort of get appreciate what I think of the weather over here it's like minus 8 at night that is colder as cold as it ever gets in Australia anyway okay so anyway Australia is a wonderful country I'm sure you've heard it lots of nice wildlife kangaroos no problems no other nothing nothing dangerous anyway applications why we're here everyone's seeing this picture and I'm sure everyone's experienced this problem you've done your application development as a developer and you give it over to your ops team and they'll go and install it for your instructions and it doesn't work okay it all just blows up in a big and of course your manager gets bit upset about this and well he's heard about this thing called dev ops and he's like well maybe we should sort of start to employ that and bring in some of those ideas and you take on that some of that responsibility to pull things in place to help the ops person and you as a developer of course it's very cynical of this because yeah okay it may help you but you just see this shift in blame so you can't blame the ops person anymore anyway let's have a little bit of an app when they've got a simple app here which is the one which we made to blow things up so in Australia so we've got lots of good wildlife so imagine we've got a an app on your mobile phone for all these tourists who come to Australia think that Australia is actually a little bit dangerous for some reason I don't know why so we've written an app and we've got it back in for our mobile phone app which is going to just give out little news balls and warning all these tourists about little incidents that might have occurred that they might be concerned about like someone getting bitten by a fudder web spider which is like the fifth most dangerous spider in the world but getting a lot of demand for our app and we thought well maybe we'll branch out beyond spiders and up to now we just don't have a bit of static code every time we want to change the message to redeploy the app so that's also not good enough so we're going to add the database we make it easy we don't really play out we're just going to put that database and where we go so of course going to deploy this on an open shift now we want to make a big change when to go from our first step to our database so the way we handled that was by using a blue green deployment and that's something that's very simple doing open shift down the bottom here we've got our original app here first version and it's exposed by route people got access to this thing that from our web app from our app on the mobile phone and we want to deploy a new one so our ops person has gone deployed our Mongo database loaded up all the data for our application and start up that web app as well and it's all running it looks all fine and then the next thing you want to do is what are you use this blue green deployment to switch our traffic from from one to the other so we do it that way that's very simple you can just go into the web console for open shift edit the route and we're just going to use that drop down to change it from our current one standalone to our database version very simple and then we'll go traffic over there nice and quick so you just want to know what should work and then we don't know it doesn't work that's our problem let's hurry up with all our flames and everyone get upset so my office and he starts to dig around inside of those log polls and eventually finds this error trying to contact Mongo database and get a time out so something's wrong he gets very upset he seems to start sending me messages where are you at surfing again I wasn't that shopping and so I get my computer get back at home and I start digging around so I check in the database and yes my configuration database so what's the problem there is and I dig over and look at it we're application side of things and I'm going oh right it's what's the ops guys done they've missed out some environment barrels which tell that web back front-end where the databases and that's how we end up in this situation everything turn on part so what can we do so the good news is that open shift has a feature in it called health checks and what this allows us to do is to run checks on our web application before we allow traffic to start to go to it to help us determine whether that patient is in good enough state to handle a traffic and actually start being useful for health checks there are two sorts of health checks that support there is a readiness program which is what I just described is determining whether your web app is actually running properly and there's another one for a liveness program in this talk I'm just going to talk about readiness pros so what's a readiness program in a bit more detail so I'm going to play a chart here now we choose the rolling deployment for application and I'm some of these learning readiness pros and what they do in the context of a rolling deployment so we already have an instance of our web application up and running and we want to start a new deploy which has our code changes so what's going to happen is we're going to run up an instance of our new application image now if we have no readiness checks installed what would normally happen is that it would immediately add it to that what's an open shift is called a service so it's registering with that the service is the thing that has this IP internally to the open shift which all your instance is essentially a joint or attached to and that service acts as like a round robin for distributing traffic so that in there and as soon as we do that because the routes attached to our service we're going to start getting live traffic we're going to remove the old application instance from the service stop the old application instance and that's the end of deployment so very clean cut over from the old to the new when we add a readiness check what we're going to do instead is going to wait initially for a little bit of period of time to see whether the new application will give you a bit of time for the application to start up okay because I might be using Python here and that's really quick but you could be using Java might take a lot longer once I think it's up once we've weighed that initial delay we're going to ask trying to turn is it ready and if it is we'll immediately go on flow on to adding an application to the service and it's going to start receiving new traffic if however it is not ready we'll go and wait a bit more time and we'll go back and check again and we'll keep doing that until either it says it's ready or if it reaches a deployment time out and we give up on it because it's not becoming ready we will instead simply stop that new application pop and fail the deployment in the meantime our existing application is still running it is the one that is associated with the service and the exposed route through which the traffic is coming in so we have not made that new instance of the application which has been failing visible at all now when we talk about readiness programs that we have three different types of readiness programs we can use the first one is using HTTP requests so we can set this up what you're going to do is tell OpenShift that on my web application I have a particular URL handler of which I can make a request to and it's going to tell me whether the application is ready now I'll see for it to support that I'm going to need to have a handler in my application which it responds to so in this case we just add the very simple ones because they say hey yeah I'm alive so we make this change we put down there and we deploy it and the application comes up again we switch out and switches over the traffic to to that instance and it once again it fails and the reason for that is by me including that bit of code there all I've tested is the web server and whether the web server is responding now if you remember the problem I had was related to the environment configuration which told the web application how to contact the database the testing web server alone is not enough what you need to do is add in whatever codes you need to also contact any of your back-end services which your web application depends on and do something with that to ensure that they in turn can be contacted and working so in this case I was using MongoDB so all I've done is for the collection in MongoDB which my data is in I've asked how many items are in you and if it comes back with a value great I'll keep going otherwise I fail and actually I sort of add a slightly different check here I've said it's the count not equal to zero because I want some data in there all right because I want to ensure that something my application has something to check if it's zero it means my office doesn't got loaded data so it's not about just testing connectivity to databases you also can use these checks to actually do some bit of validation of data in database to see whether it is what you expected to be to make sure that stuff that you need to be there for this particular instance of the application was put in place now how does that look in open shoot we start our new deployment and initially because it's going to do this readiness check you see this little circle in blue this isn't this is the old instance over here and it's starting up the new instance over here the light blue indicates that it's doing this readiness check now after a while because we still haven't fixed our code properly configuration starts failing and it starts warning us about this failing and then eventually it says it gives up a fail totally because our deployment time out was exceeded and that instance there is still our original instance and else is told us it failed so we've done that now and we've put that in place so I'll see now we go back and fix our environment configuration environment variables we can do that again it should work okay now when it did fail where do you look for information I'm going to play that conflict one employment you have blocks and events and personally I actually find that to be useless for information for this I don't know why but logs generally just have a very short message like that okay the deployment failed the events unfortunately is a little bit not any better I don't know why you don't get things in events here where the event actually turns up is over in the general monitoring part of monitoring events across the whole project in this particular case because it was not able to contact the database and it was a time out situation it didn't actually log anything too useful unfortunately so what is that response format what is it what do you when you admit that readiness probe what do you need to do so if that readiness probe returns the HATP status code from 200 to 399 ownership will take it as being good it's everything's ready to go the docs actually say it's supposed to be 200 or 399 rather than arranged they're wrong we noticed this last week when I started criticizing Jorge and his talk to say he was wrong because he had a rain and I was saying it was no it was one of the other and I turned out to be wrong everything else is regarded as a failure and importantly it only looks at the hasty piece those code there is no standards for what the response content itself from that request should look like to me that's actually a big disappointing because it means then that you've got no way of deriving any additional information from it which will tell you about what we're wrong that doesn't mean you can't do it yourself so rather than having a very simple check as you go through and all those checks you can accumulate information about the state of all the different things you look at so you might look at the database whether you connect to it whether it's got data in it and also the status of the web server and anything else like that whether environment variables you need to even set accumulate all that information and we can return that as a JSON response now other stuff won't do anything with this but it can be useful for debugging while you're happening it's not working you can get in there and do a curl get the information that can tell you what's going wrong very importantly though be careful of sensitive information in that response because you put this on a URL handler of your web application that essentially is public if someone knew what that endpoint was they could get that get back to a JSON information and you have things in there you don't want to see so one way around that is to have the handler accept a special token in any one of the request headers and I need to pass back this extra information when it's back that way when you knowing that special secret when you make requests you can actually say give me everything rather than just an interesting status here so tips for history here handers should check more than web services be sure they check the backend service and anything else you need whether you even got files present in a persistent volume that expected to be mounted or environment variables the status indications overall success or failure and return JSON response with additional data in there so help you debug it but only do that if you've actually been passed a token in there to say yeah it's okay so hasty piece of requests that is one way the next way we can implement a readiness probe is by container command and in this case what we're going to do is we're going to put inside of our along with our source cover they're going to embed inside of that image we're deploying this command that we can run and so instead of doing hasty request OpenShift will do essentially a docket exec to run the command inside the running container and then that script can do something so I've called my not that through source readiness check and I might put something in like that I go in there and I might do a pill request back to my web server to check that it's okay and I might do a separate thing to do the database now this is actually a horrible implementation of a readiness check I should get a b-minus for this one there's a few reasons I do my call request and I you may remember I mentioned that the hasty piece of case say 200 is I'm ready I didn't bother checking the response status so if that thing returns 500 because my web and web request or web server is failing I wouldn't pick that up I'm throwing away the response from it maybe that's not a good idea I put JSON response data in there and I'm throwing it away I also check the web server for the back end I think that's wrong as well so I need to clean this up but the one in Australia use it all these other things you can do and there are some of the things and the point showing is that it's me states is that don't take these readiness checks lightly don't think I'll just whack one in that just goes yeah return true it's all working you can do a lot in here but you just be careful about what you do and think it through now couple things I did do right when you make a call request don't do local host if you do local host you're not actually testing going back out of the container and looking at the exposed port if you use dollar her name that is the name of the pod which essentially your node name and that's like they're going to look up that get the real external IP and you're going to go outside and back in that's checking your web server is correctly exposed and is pretty dirty response as another trick you can also do if I run this readiness check all that if output is just going to go to standard output of the session I have to leave the shell session to run that and that's not being captured anywhere except back in open ship perhaps but I want to catch that you can do a little trick here I am to run the output of this function here through T I'm going to still let that output go to the shell that I'm running it through but I'm also going to send it to the container lock and I'm going to do a nasty trick to do that I'm going to dive into slash prop file system get a process ID one because we know that we've with containers that you are actually the things running process ID one and I'm going to get the stood out file descriptor for that I'm going to wrap it through there so as well as coming to the output from a script it also ends up in the container log the database I had another database check essentially you can do anything you want in there it's important to actually do it do as many checks as you can so in this case now when it fails in my my command container command version of this all that stuff that was out from that shell script ends up actually in the event it's unfortunate that it munges it all up because it is so she tried to format it's a bit like PhD milk you can't come past that they put the outside or retain the format and it looks a mess there I wish it'd be nice if they had a nice way of visualizing it but I also had in the log file anyway because I did that trick before so I've got in both places I've got a lot more detailed information that helps me do it now one last suggestion avoid complex commands for that readiness check command this is an example from open shift itself it's one for MongoDB and that's what it currently has and why is that bad that's because that's what it used to have they changed it at one point along the way the problem is my recollection is that in doing that they removed this Mongo step command from the database as well and as a consequence all those people who already had that readiness check set up in their existing environment because it came in from a template they all broke so if they're okay they're putting complex commands there so what I suggest is put a shell script in your container put everything in there never then change the name so you can change the internals of that script and you don't gonna cause problems then so then you're going to evoke all your checks and one important thing about it being a shell script various of the open-shift S2I builders is SCL packages software collections library and for to enable certain packages in those you have to run this SCL enable command now those builders have a trick that if you invoke something in a shell it will automatically enable them for you if I was to set the readiness command to be Python something that would use a system Python and not the SCL Python and it would break so that's another reason why I use a script in it and as I said before the reason for that script is that rather than a long command it minimizes that contract with a deployment config so if you change things in the container it doesn't really matter. Other things use the host name send the local host ensure you're checking the H3 response status and send secret tokens to enable detailed responses so that you're not sending public information to the public that you shouldn't and echo details but also send the container along. How do we do about an application when a deployment fails? What we can do here and the reason we're going to do this is because your deployment is failing it's going to keep shutting down your app okay so it goes away you can't just keep debugging it because it'll keep stopping it but you can't in open-shift use this OC debug command. That allows you to run up a separate instance of your app in a container and then you can get into that it gives you a shell you can do things you can start up your original app. In another screen you can then go use OCR to get into that same pod and start doing debug. Run your read-in just change things around and try and do things. Final one you can do is TCP socket you might use this if you have a database databases don't respond to HTTP it might be an image that you pull down off Docker Hub you don't have an ability to put a script in there to do the check. Unfortunately it all it does is checks whether you can connect you might be a great I can connect but you might have picked the wrong image and it might not be your database it might be a hosted v-server running on a talk by accident you can't tell. So rather than TCP checks one thing you should do is use what's called side card container you're going to create a special image which you then actually going to embed a little servicing to keep that container running and a little mini-server and use either HTTP check or a container command and implement your check inside of a separate container because that's running in the same pod it shares the same IP shares the same port manage flex and so the reading is checked running in your site container it could be a script which then runs a Python script which contacts MongoDB does the login checks where the data is in there anything else you need to do so that's a trick for that so consider using a side card container instead of TCP gives you a bit more flexibility and just need to run something in there like a mini web server on process ID one to keep it alive that's the little trick but it can handle hosted v commands or a command you can implement more complex checks and just that straight TCP one check their basis on. Last one very important if you're on an environment which has resource quotas such as open shift online you might want to not use these side card containers and the reason for that is that when you deploy container in open shift online the way the resource quotas work is that by default it will sign 512 megabytes of memory to that container you can only wind it down as far as 250 megabytes that's our waste of memory for a little container that all it's doing is a readiness check which is only used during startup. Unfortunately you can't wind it down if you have a your own system where you don't have resource quotas or have better control then that can still work. Other things we have liveness pros I'm sure we've seen on the other one of the original slides. What we've read in this pros at all comes before deployment liveness pros is another thing you can also use that comes after your finished deployment and can be used being to check whether your application is still alive if it dies it'll get tech to restart. Do similar things with it in terms of different types. Another thing you can use if you want to look at the whole pipeline right through from build deployment to running is there's people a post commit hook which is part of the build configuration for an image. What this is is again it's another command hook you can run when you build finishes it'll start up an instance of your container with your new image run your command and you can run tests and that way you can validate the image even before you deploy it so that's another thing we can use for checking. One there to check be careful of is be careful of running tests in that unit test which use your database if you mark up your database configuration such that your default is your production database great you might just delete all your data. Finally if you want to look at grander again if you're Java fanatic then you might look at pipelines and the whole Jenkins pipeline is another way of doing that pre deployment checks and so on. So that's all I had we've managed to get our application all working we know that whenever we deploy it we're not going to destroy things our customers our American tourists who don't like all our scary animals are happy and they know what to do. I'll take questions and I'll just leave a slide up here we've got another event coming up which people might be interested in. How are we using those snack market containers for more complex things like monitoring metrics gathering? I didn't know about one okay so the question was are side car containers being used for anything else besides this thing I've suggested? One use case which I know about is Datadog so Datadog when you use it needs to have a local demon in the container for monitoring so essentially it's a way of collecting metrics for an application but those metrics need to be sent to a local demon but you don't necessarily want to put Datadog inside of your container because it's this big Java app which actually is the thing that does the collection so I you couldn't do this with Datadog but I had a talk to Datadog and they've fixed it up now they now provide a image which can be run as a side car container and because it's exposing a port and it's seen by your web app container running in the same pod you only then need the client bit of the Datadog inside of your web app and it will then send out the metrics to the proxy service running in the other container and up then to Datadog so that's one example. So just another example of that, side car containers will have the master and then inside cars they'll have Badger which looks at the log files and makes them a web interface they'll also put something that's grabbing data and sending it to performance so the other data is going to be sitting right inside there in there so you'll have a pod with three different containers running inside of it. So you have a rolling app you choose rolling all the time and then your readiness to check the script sort of grows. Do you have any tips for maintaining the maintainability of the script? Okay so the question is how do you maintain that readiness script or readiness pro on the web server? I have to honestly say I've not used them beyond simple stuff so how you manage them I don't know but it's probably the same problem as keeping your code in sync for anything else. The important thing in this case if you're using a script for example that would just be a part of your source code repo so at least it's in the same place whereas that example of the MongoDB command that command was off in the deployment config and OpenShift it's not under source control. So at least by being a script since source control is closer to the code and so when you make changes to your app you at least might remember to go into going.