 Okay, so hi everyone, I Hope you're having a good time here. I know this can be a hard time of the day And the only thing between you on lunch and the spring two days of talks Last night was fun. We had an amazing time at Ruby karaoke So I'll try to make this entertaining for you So this is actually my first time at RubyConf It's also my first time in New Orleans and I have really enjoyed my time here But as Aaron, I think just pointed out so far the conference team team seems to have been about death That's a bit sad. I really don't think Ruby is dead, but just so we are all clear Let's check the internet because that's where you check when you want to find a sort of truth for something So let's see it Yeah, Ruby's not dead yet. Thanks for building the website Jason Yeah, we're safe Okay, so now that we've got that of the way we can continue so let me introduce myself My name is Sebastian. I work at a company called Cookepad I like when people treat at me during my talks. So that's my true handle and I came all the way from Columbia and If narcos is everything you know about Columbia, I Think you should definitely go and miss it And have like know the real Columbia and I in fact like I have the perfect excuse for you to go there We have a Ruby conference there Next one is going to be September 2018 We'll be announcing the exact date really soon. So you can follow the conference on Twitter It's gonna be fun so Lately I've been like trying to get into stuff different from programming So I've been taking some cooking lessons to improve my culinary skills So when I when I was accepted to give the stock here I decided that I wanted to prepare some jam alaya because it's something I've always wanted to try So I went online look for a good recipe. Oh Okay, it looks weird there It looked nice, so I went ahead and bought all the ingredients Prepared everything follow the steps and they was the final result as you can see it was an epic fail I definitely have a long way to go But before we continue, I really want to ask everyone a favor I'm really happy to be here. So I would really love to take a selfie with you So if you could all like raise your arms make some noise. I'll do this real quick. I promise Yeah, thanks. You're great Awesome, so Now let's get a bit more serious name of the talk is the overnight failure and This talk is based on a true story Something happened to me on a previous job. I had so Why do I want to talk about this? Why do I want why do I want to talk to tell everyone about a time? I really fucked up Well, there are a few reasons and I'll share the like the most important with you So first of all, we all have broken the internet at some point. We all have screwed up really bad We all have created bugs that made into production at some point and if it hasn't happened to you yet it will happen believe me it's just a matter of time and also because we don't normally talk about this in public like At conferences and like meetups or whatever we see people talking about what they learned How they built a cool thing how they made a project be successful This basically we just we mostly hear about success stories and I think people should really talk more about more about this in public because I Mean it will make Talking about screw ups is here for everyone And we normally just talk about this with people that are really close to us We do it in private up to our families friends or maybe our colleagues because like they're on the same boat with us And also because like by not speaking about our failures in public I think we boost posture syndrome, which is like something something terrible that a lot of us have like suffer and Although sometimes companies talk about this publicly for example like in January give her sorry good love Was down for several hours or The mistake that take a WS and a big chunk of the internet with it down for also a few hours and Like and like I can imagine how stressful the situation was I even remember how like their saddest website showed that everything was right because like the red icon that they should have been shown Was also down because of S3 So it's something that's easy to laugh about it's it's fun when we look when we like look at other people experience, but I Like if you really think about it, you can probably Feel how stressful That situation was for like the people having to deal with it And how hard it was like to like keep customers trust and regain it so Although we all we don't all work at companies that has to have such a big scale as give love or Amazon We if we create bugs, they will affect the lives of her users So it doesn't matter if it's if we have millions or thousands of hundreds of users. It will always be stressful So that's why Like basically the most important. I mean that takes me to the most important reason why I wanted to give the stock and talks about my Fuckups, and it's because We learn more from failure than from success So When we fail when we like when everything goes wrong or something that we didn't expect to go wrong goes wrong It's where we learned we learned the most so I also want to share a little bit about about what I learned so Before we move on and I tell you my story I want you to take a few seconds to maybe close your eyes if you want and think What's the worst thing that could happen to you at work? Imagine what what could be like the apocalypse for you So, okay, keep that in mind and I'll tell you what my overnight failure story was so How how did it happen? How the overnight failure became a thing? So I used to work at a company that had a like product that allow people to carpool has it was a mobile app and So the mobile app a lot of people that use their car to go to work every day So for example, let's say this is Mary. So Mary had a car. She used it to go to work and back every day and then the app I helped build allowed allowed Mary to Find Anna who had a similar who didn't have a car and also had a similar commute So they had like a similar route and the app allowed them to exchange money with one another So when they like share the car to go to work The app keep the record of the trip say to took together and even allow them to Be matched with someone else like for example John who also had a similar commute and so they could Ride share or share their car every day to go to work and they have would keep track of every team trip they took so I so How the money exchange part worked is that like once a week? there was a process that run that basically charge the passengers and then Got the money and paid the driver paid whoever it grows. So for example in this case Anna So Going a little bit deeper explain how technically tell this work technically How it was implemented. So the process that I run once a week looked kind of this so every week Like a job was triggered and the job went to the database and checked like every trip that was The every trip that users took during the previous week and then for each Driver passenger combination it put like a job on a queue. We could call it like that so then it created a lot of jobs for each week and Each one of these jobs contained like as I mentioned a passenger driver combination and the total for the trips That the passenger should pay the driver So after that what happened is that each job was processed and the passenger was charged using a payment gateway and then the driver was paid and What's there on the circle was basically managed by that payment gateway. It wasn't managed by our system paying gateway Took care of all of that. So I don't know you're still with me So let's do a quick recap. So basically we had an app that users used to call couple every day that Took care of the payment process each week and passengers were charged and drivers were paid for your trips. So Now that you kind of understand how it worked, I'm going to tell you exactly what happened so it was a any given Sunday was 6 a.m. Where the process was triggered when the process was triggered and Started usual the process runs started like yeah, the process runs started the Mark marks the start of the day that I like to call Black Saturday and it has nothing to do with like big discounts and stuff like Shopping speed a really fun day. It has to do a really really stressful day So I'll tell you what happened next. So then let's say you have a user that went really early to the store to buy like some Axiom Brad for her breakfast for her family and then when she tried to pay her car was declined And she was like, well, this is weird like this shouldn't happen and this happened like around 6 25 So she went online and basically what was going on with a credit card and she saw a lot of charges That were made by the carpooling company. I used to work for on that same morning. So That's that's the second step of our like Black Saturday So That was bad But then things started to look even worse because like more users were affected by this and it was not just a few users It was a bunch of users So the customer care team started to get like a huge influx of complaints. It was bad So let's say that like 635 a lot of users in like a small amount of time reported like a lot of bucks So that's I mean a lot of problems a lot of charges So that's where things definitely started to look bad and so it was 634. I Was sleeping. I Don't know about you, but I'm always sleeping on a Saturday at that time So I was happily sleeping in my bed and then the phone rang and It was a call from my boss and this is roughly how the conversation went My boss told me like hey, sorry to wake you up this early, but there's like an issue in production There's a lot of customers complaining about it and I was like, okay, sure like I'll take a look right away I was really stressed. I was trying to act cool. So I thought we had ended the call and I was like fuck Fuck and then my boss said like hey, I'm still in the line It wasn't looking promising So then okay, this is how are they like blacks are they was looking like So it was not even 7 a.m. The day was already crappy So I started to look into what was going on and I checked the payment gateway And I definitely saw a lot of duplicates charge duplicates charges. So the first thing I noticed What that? They this like I mean this didn't make any sense. So when I looked at her system Like our billing system. I saw that you did. Sorry the queue was still full of jobs And so the first thing I thought I was doing was like to kill that that part of the process. So basically Stop processing jobs and that will at least still not create more charges And but then I looked at the queue again in the queue grow grew This was bad So then I decided to basically like stop the whole process to so that no more jobs were put into the queue and Then like the first Thing I thought about was like, okay let's refound all the charges that we shouldn't have made and This this sounded like a really good idea at the moment, but you'll see that later. Maybe it wasn't So then another thing I noticed is that like a lot of the charges we had already made Created like a lot of transfers to the drivers and there were those were still also happening So I also had to go and we bear it bears all those transfers and stop the ones that were remaining So we could say that at like 728 the problem was kind of contained at this point So yeah, that they wasn't still was still not looking very good The queue was full it was full of jobs had thousands of jobs to be processed And when I started looking at them, I sort of look like noticing a pattern So there were a lot of duplicated jobs as you might have expected a Lot of them contained the same user that was a passenger the same user when I was driver in the same amount to be charged So yeah, there were a lot of duplicated ones So what I noticed is that like something I mean for to me was clear that the problem was in two points of the system the problem was that Here on this part of the system It seems like we were putting every passenger driver pair on the queue thousands of time for some reason and also that for some reason When we process the jobs we didn't check if we had already charged that passenger driver combination We still charge it to charge them. No matter what So after look debugging finding what was going on finding like what was the best to to solve this I wrote some tests that failed and like wrote the code to make them pass deployed it And it was all kind of fixed or fixed But that took a long time I it was a long day. I felt really dumb really frustrated and When I was about to to deploy that like the code I wrote I Did the programmer sprayer that Aaron taught to us on rails come this year. So please work. Please work. Please work Fortunately it did so at Two I'm sorry at 10 55 p.m. I started doing the obvious thing started looking for a new job Well, that was not true, but it was really frustrating. This is how I felt Okay, so this is what what had happened. So there were thousands of users affected by the bug All users were charged a different amount the worst case being one that was charged over $5,000 and When we should have only charged a user 50 Some users were charged up to 500 times The worst part was that like we maxed out a lot of credit cards and also emptied a lot of savings accounts so as I as I had mentioned I Did like I created refund for every extra charge with it and the problem was that like charges take a sorry refunds take a long time and People didn't have any money under banks account on the banks account. So I'll tell you In a few minutes how would they help with that? The good I mean something I was really glad about that is that we were still not a movie tree because if So many jobs were processing such a little amount of time I couldn't imagine what would have happened if we were using that The problem have would have been much much terrible So As I was mentioning refunds take up to five business days So there was an acceptable for a lot of our users so what we had to do is like reach out to all of them and offer them like an expedited way to To reimburse them like an expedited option some of them were like just sending them checks or like kind of no Be a paypal. We did whatever we could and and then figure out what to do with the refund But this was really stressful. I spent like I remember that at least the next week Just grabbing information about how how bad we have fucked up Yeah, it was really bad So Why did I I mean why do I why did I want to tell you all of this? Like why is this important? I want to go back to the why so Embarrassing embarrassing Tuesdays will happen I it will eventually happen to all of us I've been mentioned before and you might think that test Will's might save you from this, but they won't we had tests That co-review might save you from this, but it won't We had co-review That maybe having a qa team will save you from this, but it won't we had that as well I Southwards build by humans and as Olivier Mentioned he's talked. I think on Wednesday We're all humans. We make mistakes. So bad things will happen eventually and we need to make and make Admitted mistakes easy for everyone. It's really important to create a culture where admitting mistakes is fine and You don't feel like you have to blame others or to make excuses up for a mistake you made and We need to be trust that we won't be judged and Because they Like I mean the the safer people feel about admitting their mistakes the More they will learn about them and the more their colleagues will learn about them The more we as a community will learn about it. So When you're dealing with this sort of situation like my first recommendation is that you make sure before doing anything that you understand You really understand what happened. It's really easy to jump to quick conclusions and do Like whatever you think is the easiest fix or the most quick fix for something and sometimes we end up making the problem worse As it happened with the refunds It was a pain a Headache basically to deal with the people that we sent money on in like a different type of form And then we had already like done a refund for it was crazy It was a lot of we had to go into like a lot of trouble and talk to the payment gateway a lot By the way, like I don't know how the payment gateway a lot has to do this I guess you have noticed that we were doing something weird, but anyway My second recommendation would be move really slow So don't rush The problem. I mean there's you really already screw up the problems already People already noticed the problem. So doesn't mean that the quicker you fix it I mean that if you fix it quick though, like some people will notice. That's not the case So just make sure that you take the time to think what you're really doing which really Goes back to the thing. I just mentioned the first one. I just mentioned something that I really also learned and That I think a lot of companies had do a very very good job doing this is document the problem document what happened Document what the root cause of it was Document how you fixed it document what you will do to prevent this from happening again so also document the fix and Document this is the most important thing. I think document the lesson learned Because that's the only way that you will really prevent stuff from happening again And also don't do this. Don't get blame. It's not about who wrote the code where the bug was That's not important. The important thing is that the process failed the company failed the team failed It's it's not the it's not only the fault of the person who wrote the code. That's just part of it so doing it blame is just like Not creating a cool culture where people will be will feel safe and Will feel like trust to admit their mistakes And really importantly You are not your failures Don't think that because you failed you did a big screw up You're the worst developer ever because you're not Everyone has done it every every developer you have admired they have screwed up at some point believe me And I think child's follower can put it better than me like no one will care about your the box you create when you die So Keep in mind that it's so temporary So as bad as you might feel the moment that you're dealing with a really bad situation It will pass and you will look at it in the past and laugh about it give give talks about it We'll we're here. I mean I really want to celebrate failures because they also make us what we do So you can come and talk to me tell me your failures. I've heard a lot and it's a lot of fun It's kind of therapeutic And also you can tweet using the hashtag of the conference It would be great if we share our war stories our scars Because as a community we can help each other and and like Get us through this like really stressful moments in our careers We can we can make it together and I'll actually let John and mpa say it better Oh Too bad you can hear that So anyway, I think that's all I wanted to say for today. I work for this. Oh wait, what happened? Okay, we're back Okay, I work for this company called coupad go and check it out if you like cooking and Like or sharing your recipes. We're hiring as everyone else. Uh, so if you want to Work at a company that works, uh, let's sorry that where you can build like an application That's used by millions of people around the world Come and talk to me and thanks a lot