 So we are going to talk about how to debug applications. I'm Dr. Xu Jiao Gao. You can also call me Dr. Gao or XJ. I'm from Stack Win. I'm a client lead and a cloud engineer at Stack Win. So Toy Store has a famous quote, said happy families are all alike. Every unhappy family is unhappy in its own way. I think it fits very well with the CF and its acts. If everything is working, probably there are similar. If something broke, good luck. Try to figure out what exactly went wrong, right? So in this presentation, I'm going to share with you some common failures and also some tips how to debug it. So from operator perspective, I put a search here at the top because I suffered from it before. So, but this go back maybe a couple of years ago, deploy CF, seems just didn't work. Turn the log will not directly tell you, hey, your search is wrong. So we have to go inside, figure out what exactly the issue. Turns out, it's the mutual TLS inside the cloud file. It turns out one of the component is looking for like seems like maybe internal.CF something in the sun. But I have the IP there, so it didn't work. At this point, I don't remember like super details but the pain I remember. So I want to share that. But now, life become better because there are credential management thing we use make life easier. As you can see here, Genesis is a tool we build how you deploy CF, not only CF, also seems surrounding it like a credential management, backup and restore monitoring and all that. So, but focus on this credential thing, search or password or what, what it brings me is, it can help me check if my search are good. And it also have a nice feature like, oh, how about if it's a buyer? It can, you can easily renew and how if you think, oh, this may not be safe anymore, oh, talking about safe, that's about CLI to bring those nice features. Then you can say, oh, rotate this. So anyways, with this tool, my life become better because I don't have to deal with those issues anymore. It will be taken care of. Talking about that, tomorrow, Charlie Baum actually sitting right there, tomorrow he is going to give a talk on this Genesis like how to deploy CF and things around it efficiently, how it worked for their business all that. So it's tomorrow, 11.45 is right before lunch. If you came to my talk on runtime, I'm going to tell the same joke, forgive me. So that's the talk right before lunch. I believe that's a great appetite for you before that. So please make sure you go to that one. So next one, the top two list is networking. I guess some of you already say, yeah, I agree. Because there are so many parts, even you don't have full control, right? The proxy setting. And quite often, you know, or I need to use the proxy for this, but then configure the non-proc, skip the proxy may become painful because there is a format all that, right? Then firewalls, are they allowed to talk? Is it blocked, right? That all cause your CF, your IPer doesn't work. And the next two is security group and the net code. I especially let's say take AWS for example. Wait, I have a security group all open. What doesn't work? It should work? Guess what, check your net codes. Maybe you have something blocked there, right? So those are the lessons we learn over time, like how you use both together to control your networking. The third one is resources. It's very common actually, some issues are caused because run out of disk, run out of memory. The log will, sometimes the log will not tell you that's the issue, you have to go inside, oh, DF, oh, 100% is UC, something like that, right? Another thing I want to mention is the configuration, like there are some default configuration in your deployment, like for example, PID limit, 10,024, right? Sometimes you reach that limit and things going wrong, then you have to tune the parameter say allow more, set it to zero actually, allow more. Another example I wanted to give is on the database, take a Postgres for example. Things should just work, but somehow it reached a maximum connection number. Now we talk about it, it seems simple, but it take a long time to figure out that's the issue. Did you guys have similar pain so far, like when you? I see some people nod their head, okay? I'm not alone. Next one, human mistakes. Someone is smiling, so I guess, yeah. Actually, I think this story made from Charlie, like today everything works perfectly, second morning I came to work, why this doesn't work anymore? There are other teams maybe change the data story name for you, certainly then things will not work for you, all the disk may be gone, you know? So yeah, we have to consider those, in fact when we run Cloud Foundry, so that's from operator perspective, what I see, those are the common things, then from developer perception, certainly you have to know your app, you want to follow the 12 factor rule, right? Then your app size also matters when you push and we work with your code plan or that, then the build pack, like for certain language you have to match certain version, you may have your customized build pack or that. So I take a Java app, for example, it's not because I'm expert on it, it's because it's a good example to talk about, let's say if there's a issue on not enough memory, you have to know like how Java apps works, say if you say insufficient native memory, so you will know, oh, that's for JVM to work, so I need to check the allocation for that part, but sometimes you may see high thread, you may see, oh, high thread count, like the threads count over the limit, then you have to set the memory for that through the number of threads instead of total memory, so from the CF side, three things we use quite often, certainly always check the logs because it will tell you what happened and check the events and turn on CF trace to get more details, so when you like operator maintain CF, developers use it, you really need to talk because more information is provided, more different perspective is communicated better, because for example, the developer may spend a lot of time to debug, why this doesn't work, but maybe there is an outage at a system layer, not at the app itself, if they have that information, they don't have to spend the time to dig into this, then for operator, if a developer can provide more information, that's helpful for operator to figure out how can help figure out the problem, certainly I guess like we mentioned all these networking thing, like a resource allocation, you have to work with a different team, like communicate more, sometimes it really helps you figure out what's going wrong because you know, oh, they changed the date, so next I'm going through some common failures quickly, anyone see this, I believe most people will see this like an app staging failure, right? Yeah, I see several people not there, had another one build a pack, I said build a pack but it's a couldn't download, but sometimes it's a networking issue actually, okay, staging done, somehow I've not running afterwards, timeout, timeout actually can be like let's say, I can think of two, I will talk about two main reasons, quite often it's networking, right, maybe it's blocked, another main reason is the request the timeout setting, let's say you only allow it to wait for certain time but it take longer to process that request, you may want to adjust your timeout, this one looks most painful, okay, you said we should use logs, we should use CF trees to see more information, how about what it all tells you is unknown error, sorry, no more unknown what we are going to do in this case, right, yeah that will help you solve the problem for sure, yeah, no app logs, I won't get a log but I even cannot connect to the log server, you told me look at the logs, what I supposed to do, right, cry, so we have talked about generally all these situations, what may happen, come on, tell us how we deal with this, right, so here is a case here, let's see if I can, oh, okay, good, you can see, yeah, surprise, I wrote that blog, okay, so in this blog, this is what one of the error I have, like I couldn't connect to the log server, but what I did is I turn on CF trees, then I see it's saying the Doppler something, oh, then I realize I can, both actually to Doppler to see what's going on, then I remount it to summary, everything is running, what's going on, okay, now I can look at the C-slogs on the Doppler node and it tells me, okay, out of sync, so it's giving me information, so it's just take, so it just take some more like time to think how to chase the error here, right, so to see if I can come back to my slides, good, yeah, so quite often is you use all the techniques together to help you figure out the problem, this one, as I mentioned earlier, unknown error give me fear, I guess that's common human thing, we are afraid of unknown, but it's not that bad, this one is one of the example how we deal with unknown error, yeah, again, it's me again, so when I first see this error I was like unknown, no, I want to know it, so sorry, I see someone are smiling, so I want to join that, so what I did is if it's unknown, what I can do, I can try to understand what this step is doing, right, so I can see, okay, in this example it's doing uploading build pack, I think, is that exactly the, oh, I cannot scroll the thing, so okay, it's uploading something, it said, cannot upload a field, then I know in that step it's supposed to upload that thing to the blob star, but somehow it failed, this makes me think I need to check things around the blob star, so certainly took me some time, poke here, poke there, then certainly I will check my blob star, I was using AWS S3, so I log in, guess what, I don't have a bucket there, so certainly it cannot upload, right, so then I create the bucket, then things start rolling again, so the whole point I make here is don't be afraid, try to get more context, know what's going on, there is always a hint, there is always a direction you can try, then you're, ah, that, no, eh, eh, okay, cool, then since we talk about networking is a, like, big aspect that we have to deal with, there are some tools we can take advantage, like TCP trace, TCP dumpling, to see all the traffic, to examine what exactly went wrong, then let's cut and see you can test if this part is open, if it can connect from A to B, and I wanna point out this HTTP tracing, you see, I have so many types open, by the way, I'm a computer science doctor, also a type master, that's why I have so many types, so this is from one of my former coworkers, this HTTP trace somehow I couldn't scroll up and down, but feel free to check out our blog, you will see more information, but to summarize what's going on is, see, Cloud Foundry app, a blog, it doesn't show you the HTTP content for the post or guide, it's, I mean, it's okay for delete, right, but there are situations you wanna see what content is sent for post or guide request, as it doesn't show you, so there is a tool we, our team created called Gacha, basically help you set up, you can see all those traffic with the content to help you diagonalize what went wrong, cool. Another thing I want to mention is certainly take advantage of tools, let's say, if you run into an issue out of memory or related to Diego cells, you can use a CF dot to interactive with your Diego cell, like check the memory percentage or all that, so there's more than that, I will not list one by one, then UA, you can also use the UA CLI to interactive with your UA more to figure out any authentication related issue, so we have talked about often from most of the technology aspect, but I personally think the human side is very important when we deal with those issues, why I say that because I believe, maybe because I'm a lady, we take care of emotions first, then we can take care of other problems better, so come mind, what I say by that is once things are going wrong, we may always say, why this doesn't work, it should work, everything looks good, why it doesn't work, it should work, it should work, why it doesn't work, right? But then we are in this loop and didn't jump out to focus on how to figure the problem, instead we are doing this self-conversation. What I adopt is ask different questions, instead, because sometimes I keep thinking it should work, so I ask myself, it's not working, something must be wrong, then what is wrong? Then I take a step back, okay, ABCD, those first things are involved in this problem, I will check each of them, what may be wrong, so that's the approach I take, basically step back, I know something is wrong, I will check each of them, get a second eye, come on, I got a first eye, second eye, why I need a second eye? That's a joke, I take a courtesy laugh, so this one, I love this one because back to when I was a junior, I just started, I was still at school, but I doing the internship, I hesitated to get help because I'm afraid people think I'm stupid, right? And they will not, that's just mine. Then also for some intro words, I know there are people, they would like to figure out the problem themselves, they don't want to go to ask help, but I found that this is super helpful for multiple reasons. I'm debugging an issue for a couple hours, I'm stuck. I went to my coworker, I said, hi, do you have some time I'm having an issue? Then he said, oh yeah, I solved that issue like last week, it took me five hours to figure it out. Then he told me, guess what, save me three hours, good because they may deal with same problem, right? Then I can use that three hours, get a cup of tea, go shopping. Yeah, so I found like, there's definitely benefit to get a second eye. Also, I have stare at this for hours. I look at everything, they look the same, they look right, then when you get a second fresh eye, they so easily observe the difference. Maybe sometimes it's even a small difference, this should be capitalized instead of uncapitalized, right? Then, oh yeah, figured out. The third one I want to mention is for me, quite amazing, I have this problem, I go to my coworker, first step certainly I will describe the problem, right? Then I say, oh, this is this, didn't work, and da, da, da. Guess what? They didn't even answer yet, then I realized, well, I'm describing it. I was like, oh yeah, I know, this, I should try this, then I solved the problem. Then I said, thank you so much to have me. I didn't do anything, but it's hoped when you try to describe things to others. So, how do you find out more? I found the docs for Cloud Foundry actually is pretty good. Yeah, quite often, it's really worth reading. And even better one, we have blogs, what I showed you. So, when we enter problems, when we run things, we always use extra time to wrote the blog, so other people can benefit from it. Like, one time I wrote a blog actually, I had an issue, it turns out it's a bug in the release. Then I did a blog, guess what? Someone in Europe had the same issue. He across my blog, then I just told me, said, hey, there is a lady from Strangling and wrote a blog and solved this issue. Then he started to describe, I said, are you talking about me? That's how we know each other. That's actually what motivated me to write more blogs. Thank you all for being here, and you're welcome. I know this is the last session, so I was being considerate to keep it not too long. Okay, next, QA. If you don't have a question, I assume you have too many questions, you don't know what to ask. Yeah, if no questions, happy dinner, happy evening.