 Okay, good afternoon everyone. My name is Jin Jie Cai, or you can call me Jack if that's easier. So I am the Bluemix runtime architect for those who does not know Bluemix. This is the IBM's cloud application platform. The topic I'm going to talk about today is the 10 common errors that can happen when you push an application to Cloud Foundry. So I'm going to categorize these errors into four different types, and based on what happens behind the scene when application is pushed. So there are client-side errors, fabric-level errors, that errors happen during application staging, and finally when application is actually getting started. So first, let's take a look at, you know, what actually happens when the application is pushed to Cloud Foundry. I guess this diagram is probably already familiar to many of you, but, you know, as you can see, there are lots of things going on when you do a CF push. The cloud controller are needed to talk to multiple components in order to complete the request. And as you can imagine, any step of those can fail, and thus cooling the final failure of the application push. So this is how I categorize the different types of errors that can happen. The first type of errors are the client-side errors. So this happens when your client tries to talk to the cloud controller in the cloud. And then the second level of error is fabric-level errors. So as I mentioned, there are many cloud components involved in order to stand up the application. So like the cloud controller, there's database behind it, there's a blob of store, there's DEA, and of course, there's messaging bus, you know, between all these components. So if anything goes wrong there, you will get an error too, and we'll see how we deal with those. Then the third type of errors is the application or staging errors. So this usually has something to do with the build packs. And the final step is when everything is created, your job is ready, and the DEA is preparing to start the application, something goes wrong. So that's the application startup errors. So I'm going through all these type of errors, talking about the possible causes and the solutions. All right, so first, let's start with the simple ones. Client-side errors. Number one, it's not really errors, you know, some of the best practices that you should follow before you start. So number one, make sure that you have the right perverge in the space that you are pushing your application to. You need to have the developer role in order to do that. Number two is that, make sure that you have a right version of your client tool, like the CF command line tool, when you work with a remote cloud. The client tool is evolving, you know, very fast. Make sure that you have all the defects fixed that you need. Number three is that, make sure that you're pushing from the right folder, right? Sometimes you're just, you know, hurry to get started and forgot to specify the package that you really want to push, and then the command line tool will try to upload your whole current folder to the cloud, and secondly is that, make sure that you are picking up a right manifest file, because we know that the command line tool will use that manifest file in the current folder, and many times if you are working with a sample, you know, some other people's application, it comes with a manifest file in their root folder, and it specifies many options, so if you don't intend to use that manifest file, rename it all right, so we are ready to go. The second error is also obvious, the root is already in use, so as you probably already learned from the last session you need to provide a unique name for your application to create the whole, the URL for your application, right? So make sure that it's not in use by others, otherwise you may need to, you know, change it to another one using that option and option, or of course if your application does not have to be served from a URL you can use the no route option or if you don't care, you can use the random option. Our next error, also a very simple one, is where you start to exceed your organization or your space memory limit, so as you know each organizational space in Cloud Foundry has a memory quota, so if you already pushed your, you know, like 10 applications then they use all the memories that's allocated to your organization then your next push will fail so make sure that you have enough memory to work with. Next error is also obvious, so you know besides memories or disk is also our constrained resource, so the default that you can request is one gigabyte or so unless your cloud provider changes that default or you cannot request more than that so if you specify, you know the K option, like 2 gigabyte disk for your application, the push will also fail. So these are the simple ones to start with, now let's move to another slightly more complex errors. So if you are working with a command line tool, sometimes you will see this you do a push start to upload your application files and after a while it will report an error so this means the application file upload failed or possible cause one of course, network could go wrong so make sure that you have fast enough network to work with and you know make sure that the network activity is right but there could be other causes and one of the usual situations that your application is really really large so we need to understand there are two limits here, one is the application upload time, there is by default 15 minutes limitation there so if your application takes more than that to upload, it will fail also there is a size default which is one gigabyte limit so if you cannot complete the upload within these two limitations, consider these options first one is to exclude unnecessary files from your application you can use that .cfignore file to specify what files you don't want to push from your current folder and most of the time you know, you really don't have to include all those files for example, if you are pushing a Node.js application, you don't have to push all the Node module dependencies to your application because the build pack will provision those dependencies during the staging time then the other option to consider is that if you really have to include those dependencies or thinking of another way instead of putting them into your application put them into the build pack so you can create a custom build pack to contain those dependencies and when during staging install those dependencies to your application that way you don't have to include them in your application package and lastly a very special trick that you can keep pushing for several times because the cf command line is pretty smart so every time it will try to push a delta by repeating this task every time you push a little more files to your cloud so that way eventually you may succeed to upload all your files if you push two files each time then if you have ten larger files or five times later you'll push them all but it's just kind of tricky so now let's move to the next type of errors which I call fabric level errors this is where you will see weird messages like 500, 400 and with some error code that you do not understand so really if you look at this diagram the step of those flow can fail the database might be down the blob store might be four we may run out of DA's etc. so to find out which step really fails I think one of the technique here is to turn on the cf underscore trace variable so that you can see all the restful communications between your client and the cloud controller that state actually fails that will give you some indication whether it fails when the application metadata is created or when the application droplet is being created etc. so essentially if you really think this is a fabric level error there's not much you can do you can talk to your cloud provider to see whether they can look into the fabric component errors like the DEA logs or the cloud controller logs to really find out what's going on there so now let's talk about the third type of errors which I call application staging errors and the first category has something to do with the build pack so number seven is when you specify an invalid build pack name or url when you push with the b option so make sure you have the right build pack name and make sure you have the right url sometimes if you use a wrong you know wrong type of url the message is not very helpful like the one on the left side you just say okay it's cloning and it's then failed so you have no clue to tell why it failed so make sure you the cloud actually can reach to the build pack url that you specified right especially when your url inside your enterprise firewall it's not going to work if you are pushing it to a public cloud so if you don't specify the build pack then the cloud will try to detect the application type by invoking all the detect method of the installed build pack until some build pack raise its hand say okay yes I'm going to stage it right but sometimes you will see this message and no build pack exist so what does that happen first maybe your application package is simply wrong and no build pack can actually recognize it so one of the common errors here is that many users create application packages with a root folder inside the zip which is not required most build packs expect the application files to actually exist in the root folder instead of you know one level inside another folder so make sure you are not doing that second is that again push from the right directory if you are not specifying the package explicitly number three is that url make sure the required build pack is actually installed in your cloud you can do self build packs to list all the installed build packs for example if you are pushing a php application make sure the php build pack is actually installed lastly tricky one is that if there is a bug in the build pack detect method which modifies the application files that could cause some unpredictable errors because it changes the application files and then all the build packs code afterwards are not getting the original application package so that actually happened with my team at some point so make sure the build pack is doing the right thing when it does the detect so url after a build pack is correctly formed and detected so the next thing the build pack do is to actually compile your application so error number nine is a very big error there are many causes but the message is simple you know you get from the command line tells you application stage and that's it so what do we do turn on traces if the build pack has that support so I listed some build packs that does have those support java and liberty build pack have that jpb log level that you can set to debug and then our Node.js build pack if you want to see more npm messages or you can actually use that npm configure x, y, z to set various configurations or you can use that npm rc file to include in your application package you can specify the log level to silly to see all those npm very detailed log messages the new php build pack also has the bpd bug that you can enable and then once you enable all those log messages read the logs you can query the recent logs by using that first command or you can read the logs continuously what you can do is that you have one window to push the application and then have the other window to read the logs so it's like the tail of the log kind of experience so once you have some logs then we can look into what might be the course course number one it's all your fault again do you have the wrong application package so for example if you are pushing a Node.js application your package.json needed to have the right syntax if it's bad then the build pack will fail to read it so that's number one course number two is that the build pack is unable to reach external dependencies during staging many build packs will download external dependencies when they try to stage your application so again for Node.js application it will talk to the MPM repository to download the modules you know claimed by your package.json so make sure that the servers the cloud has the connection to those dependencies right and there's the security group setting can impact this because it set up some network rules so make sure the security group rules is not bending the connection to those dependencies course number three is that staging time out again just like you know when you do the upload staging also has a time out which by default is 15 minutes and that's the maximum you can use a T option to specify this time out by default it's 60 seconds but you can increase it up to three minutes but that's what you have so make sure that you are not spending too much time in staging if you have to then you need to customize a build pack right do less time consuming things during staging course number four is staging uses too much memory so I guess some of you know that staging actually also happens inside a water container just like the application is running the water container has memory limitation and a disk limitation you cannot go beyond that if you go beyond the memory limitation sorry you will get killed silently and suddenly so if you experience this you know you push several times and died at different time points without much indication you may guess okay maybe I'm killed by memory and disk similarly but the nicest thing with disk is that you won't get killed you just won't be able to write to disk anymore so when that happens the build pack works with different error messages depending on how it's writing to the disk right so again make sure that's not happening course number six is that when you are using an unmatched build pack at your level so many of the tutorial or samples out there in the net when they tell you how to do the push they always push using the master branch of the build pack personally I don't think that's a good idea because that means you are using build pack code that's still in development status instead what you really want to do is to specify a release the version of the build pack like what I'm showing there like v3 you got the v3 of the Java build pack in this case instead of using its master right so another build or another best practice that you probably want to follow next course is that your application is picked up by the wrong build pack right remember if you don't specify the build pack the Cloud Foundry will do the detect thing and sometimes another aggressive build pack may grab your application and say okay I want to stage this but you don't want to use that build pack so in that case of course you can override that through the B option but really I think there's something to fix either the build pack is too aggressive you need to fix that or your application contains some suspicious files that another build pack is interested in so fix that as a root course last one is very tricky again and this is a real problem that some users encountered and reported on our stack overflow so what happens here is that when DA invokes the build pack code detect compile and release imagine what happens if the script does not have the execute bit set in their file attributes the compile will fail of course and again you do not get any helpful messages from the output just say cloning the build pack then silently compilation failed so make sure that you are setting the execution bit to your three scripts in the build pack especially when you are cloning you are forking a build pack and then repush it from your windows machine to a remote repository you may lose that bit in those three scripts and you need to set them explicitly alright so that is compilation errors and the last big error is startup errors this is where the droplet is already created you see that message uploading droplet so so far so good and then the DAs are provisioned to actually study application and then you start to see this endless messages starting stopped, starting, stopped and essentially it tells you applications start unsuccessful or time out and if you query the application status and it may report a crash or a starting it just cannot stand up so how to diagnose this again our log as your friend or you can do these two things again to see the messages that spit it out in the log so the possible causes number one is again or you are taking too long to start or there's the limit is again 180 seconds so if your application cannot start within this limit or you are going to fail so it's not really enough what to do or root cause one is that you are doing too much initialization like for example you are reading lots of data to initialize your application so try not to do that or you can do lazy initialization or you can do asynchronous initialization the other option is that you can start your application with the no rot option that way the DEA will just kick off your application process and then if it starts it will report back to the health manager saying okay it started and it will not try to bring the port of your application so then you can wait for your application to complete its initialization and then map route to actually bind it to the URL so that's one thing that you can do our second root cause is when you are listening on the wrong port right so the application actually started and everything goes well but it's just not listening on the port that the DEA expected to listen so make sure that you are using that environment variable and use that port to open your application HTTP listener third root cause is when your application during start try to reach out to some external dependencies and because some reasons it's going really slow so make sure that the connection is good and the security group setting could also impact that so course number two is the application actually failed it to start and exit or applications in Cloud Foundry should never exit by itself right so if you exit the health manager will think okay you are die and I'm going to study you again so check your application logic don't exit on exceptions right or sometimes it's probably just you are missing a service binding so make sure your application has all the service bond when you get it started course number three is consuming too much memory so like staging or your application container when you do the push you specify the memory limitation right and you cannot exceed that if you do again you get killed suddenly and silently so make sure you have enough memory to run your application disk same situation as staging right there is a default which is gig which is two gigabytes right so if you go beyond that or the cloud provider may not allow you to do that so you need to think about maybe you should use external storage right a blob service or storage service to store additional things instead of writing to your local disk that's not a that's an anti pattern in their application anyway so these are some advanced techniques that you might use in addition to what I showed already first is that you know when your application keep get killed and you have no way to find what's going on right and you really want to get into the container to examine the files there you know see whether there is some crash on the files etc so the trick one trick you can do is if the runtime has a way to run a hook when it exit you can do that so for example with the IBM JDK there is an option called dump tool that you can use to specify a script to run when it exit when the JVM exit they will invoke that script so in this case the script is sleep for one day so that means in the midnight the application got killed it will stay for one day so the next morning you wake up you go to your application and you see it's not working but it's body is still there you see files to look into other files still there in the container right and to make it more useful if you combine it with the dump tool so it will generate the dump memory dump, heap dump, thread dump then you can download those dumps and do more analysis if the runtime does not have those hook another trick that you can do with all the runtimes all the build packs is that you can override the startup command and append that magic command sleep one day again right you can find the normal command by you know push a simple application make sure it gets successfully started you will see the actual startup command in the CF command line output right copy that command append that sleep one day thing and then push it with no route with the application in problem and then that application will get pushed successfully and it will stay for one day for you to examine what's going on there right and of course after you are done you can stop it so that's the a general trick that you can do with crashing applications the other technique is to run an agent inside your application container as a main process so that the health manager will not care your application right because the agent process will be alive and then you can with the container container you can use tools like the CF SSH command to SSH into your container and then start your application there and you know work with it interactively to find out what's going on we also introduced an interesting feature called development mode with which we also have an agent built in and you can do things like remote debugging and also do the SSH or alike experience you can have a console window to work with so those advanced techniques will also help you to diagnose our application startup problems final tip is that you know keep pushing applications several times sometimes some weird things will get cached and the build pack probably is you know master things up so one of the practice that you know just delete the whole application and start over that will clean all the cache and you'll get a fresh start so sometimes it's a good tip to have alright so as a summary so I think today I quickly walked through all these four types of errors hopefully that's a complete coverage of many of the common errors that you may encounter in cloud foundry and the techniques that you can use to diagnose these and the options that you have to solve them so with that I think I probably still have two minutes probably for one question yes please right now it's kind of properly to Bluemix I have the Liberty build pack and Node.js build pack in Bluemix since this is short are any other questions before we close if no thanks for coming and I hope you enjoy this thank you