 Thank you very much for attending my session and my name is from fact and I'm coming from my company called by arc we Okay Deep breath, okay. Okay. Today I will talk about how we adopt the open source and and how it helps us in in the time of a Little bit of a crisis that we face last year to scale actually it scales four times After after I finished the flight I found out that at the end of the event we actually scale about four a little bit more than four times Within two months not five times, sorry Yeah, but because I compare it to the current current capacity that we have it is actually right now It we have five times Compared to that that event okay, so because We have a lot of non-discretional agreement and yeah with so many partners for many customers So I don't even sure what we can reveal In the flight so most of the names will be hide Maybe hidden or maybe modified to to be something not so easy to guess So so don't don't try to keep you see see the name of the HP in In in the form of some kind of animal or something like that Okay, so so what we are doing by arc we are providing my company is providing the CDN content delivery network Infrastructure is a kind of the cloud services that offer the people who want to distribute the content I mean Put something for for the customer to load from from the web or something like that so the most prominent use case of the CDN is to be used in the widow streaming because because One one gigabit perfect server usually can only contains about 400 viewers of the HD not full HD just HD which is a very very small figures for for the expectation of Most of our customer most of our customer usually told us that the one million We were at the same time. Well, actually there never be any single event that have achieved one million Combined yes, maybe but no for the finger even So so in in most cases If you want to do the widow streaming in in in this country, you usually need some kind of CDN either Either Buying the services from us or from our competitors or maybe build the CDN by yourself You you you have no choice You cannot use the single server to serve the wheel of streaming in in large in not not large scale Just medium scale. It's not possible anymore So so so just I just want to keep some background of what the CDN is actually doing In most cases we have the edge server running in most of the internet services for wider in Thailand and When when when the client I mean client of our customer request from something from the server Our edge server will fetch that that content from the origin server and cash it in our server The next request if if the new client request the same thing the request will not not Coming through to the origin server for the bandwidth used by the origin server will be much less especially if the the content is some kind of like Like streaming where everyone view the same thing the same content the same view of streaming. Yeah, that that will be a lot of Saving on the origin site So this is our tech stack. So just to give you some idea that we use a lot of open forces Actually, this is the current tech stack, but what what I'm talking about is the Well, okay, this is the the our our distribution of the CDN node Well, why and this is what I will focus today. This is the current capacity that we have we have about a little bit more than thousand gigabit per second capacity to serve the video streaming in in Thailand This is now but actually What I would like to talk about Focus about in today is the the event that happened around 2018 February at that time we have about 200 gigabit per second and This is this is the capacity that we have at the time so for what happened in just one year actually we We we have a major customer That has a TV program that has has gained a lot of viral attention from the internet and Then the customer has come. I mean Talk to talk to us via line the line line application and want us that this program could gain a lot of traction A lot of we were to review the video streaming. So this For for the information, this is the it's a kind of like a to turn episodic program So it's not just the one one one time event. It's continue for about two months So, okay, so this is not the first time that we This kind of the warning from from the customer. So, okay, we see that We have 200 that at a time so so the first episode of This TV program I have to hide The the time line because if you if you see the date and time you you know exactly what actually this is The first episode we got about 50 gigabit per second. I mean this is this translated into about 50,000 we were concurrent we were at the same time. This is this is quite large, but not that very large Quite large and not not that very large Okay So so so at the time we still feel relaxed and thinking that well, this is not that too much to handle Okay, it's quite large at a time. It's old. It took about 25% of our capacity But it's not too much to handle and we actually our team that that that time It's I think it's Wednesday and our team still thinking about hanging out at in Friday night making an appointment and the second episode We see the bandwidth go up to about a hundred Do you see the chart decline here? it happened because Because the infrastructure of our customer because we provide a CDN Our customer provide the application and website to put the streaming into it so at the time our customers application and website bought down Because a lot of we were coming in and and and getting to to their website and their website down Which is actually it favorable as frankly speaking because the traffic is so much that that We it's more it's much more than we anticipated at that time. It's the maximum bandwidth that that happened is about 120 gigabit, which is about Yeah, it's 120,000 confirm we were about about that 120 150 150 something so so at that time what happened we we start to Experience a little bit of hiccups in in the in our servers because we we cashed the content right and Most of the time there will be some some kind of content that that cannot be cash because it's not it's not it It's just being generated if it's a live streaming video streaming the new content will be generated every about five or six seconds and Usually some loads will come and hit the origin server and that usually it is about 1% but for 1% of the 120 Gigabit per second is more than 1 gigabit per second so that is too much to handle for the origin server and Yeah, that as I mentioned lucky the app and website of our customer also experienced the technical problem and it went down so for the the amount of the bandwidth use is not going up after that and At the time we quickly engineer a way to to even make the caching process Happened more quickly for the Lord that that come coming through to the origin server is going down to about I think maybe 0.01% and We cancel all the plans at that weekend Okay, and at this time we we start to think about the Increasing the capacity we plan we we contact vendors had where something like that and we try to negotiate the way to quickly expand a kind of like temporary Infrastructure to to accommodate the Lord Some of you might think that why why don't we use maybe something like public cloud or maybe use something like Amazon cloud front or something like that to provide a services I would say that if if if we use that it it it can cause like maybe a Million per episode a million bars. I think yeah, that that is too much cost for our customer and for us too, so so if possible We we don't really want to do you use that in the first first place so so we try to optimize our infrastructure first and This is the third episode, which is a week after the second episode The third episode we see a double again For each episode double double double four. We now are quite around the quadruple Concurrent it just in just a third episode and and don't forget that it's actually about it to 10 episodes TV program so it's not halfway yet, and we have seen 200 212 point one eight gigabit per second in our System so so yeah, we really panic because at a time we only have about 220 gigabit per second capacity so for what happened this is Okay for the record this is the first time in in the history of our customer also that that TV program has gained a lot of traction in just a short amount of time Usually, okay some some time some of these kind of TV program may make gain a lot of traction But that will be almost near the end of the TV program. Maybe at the episode 78 not episode three so so we contact all the internet data center that we know of we We are that that we are also their customers to to to have a kind of like temporary dark space and bandwidth and Negotiate the price of course and we got about 25 percent more more rack space more bandwidth That that we can put the server in and scale immediately So and the next thing we do is that we we buy a lot of servers That is the nice thing about open source because we don't really need To do the something like compliance test or something like that we can buy anything that is 864 servers and we even looking for the second hand server anything that that can that can run our software because all all the stack that we we are doing is based on the open source and Okay, we we also found that the Origin server is load a lot of a lot again, so we we engineer a way to broadcast the Video streaming the live view of streaming to all the edge server at the same time for the load to to the origin server is very very very very little and The famed at the same time our customer for experience from technical problem It went down again and for the car there they use for the public cow one of the very famous public cow and This is the fourth episode which species the next day after the third episode and we see the new record again We see 250 gigabit per second that is about two hundred and Fifty two hundred and fifty sixty thousand concurrent will work at the same time Yes, and this is the first time that we see the scale to just see the scale Not reaching it yet of the 300 gigabit per second for for the single event Okay, so This day, this is the first day that our customers infrastructure is not Not not not went down because they scale. I think they scale to about hundreds of servers in their public cloud To to cover for the Lord's Yeah, and we we also finish increasing 25% of capacity that we deal you with our IDC the day before and we expand it within the single day because we use Ansible we use the depth of things like that to to to accommodate the expansion so we can quickly increase the capacity very fast and This is the first day that we have to remove some of the Resolution I mean in the because in the live streaming this day we usually chip with the multiple resolution They call it adaptive with red. We have at that time We have the HD that the 20 seven 20p Resolution which is the highest resolution that that we provide for this TV program And we have the SD the 4 480p Resolution so we we negotiate with our customer to remove the highest one to save some bandwidth and actually the bandwidth is not the The very serious problem at that time the very serious problem at that time is actually the processing overhead That occur because the HTTPS Because at that time HTTPS is being widely adopted because of the coming of the latest encrypt certificates and enforcing policy from many many browser like Chrome and Firefox So a lot of people using HTTPS so after the full episode we have about About a week About four or five days because we cannot we cannot scale out until the last last second We have to do it and we have we have to test so we have only about four or five days to scale the system to cover the load that well if if it double at every episode it means that the at at the next week at least we need four or five hundred gigabyte gigabit percent to capacity to accommodate for the concurrent viewer and This time we do not have to contact anyone because all the major internet service provider in Thailand contact us because they all the peering because in in the internet actually a web of the connection between each other right and at that time, I think we already exhausted all the peering of the internet service all the internet service provider for we Are really panic also for they contact us to to have our servers Put being being put in there in IDC so the traffic is not coming out For example, if you use True the traffic will be limited only inside a true network or something like that if you use the detect For example, you that the traffic will be will be limited inside inside only the local network So, yeah, we get a lot of black space. We get a lot of bandwidth at the time, but We don't have enough the server so instead of so So we took over the server from our friends who just went out of the business they They did the public how kind of things like that business and they kind of like when Things not going well and they went out of the business So we took out of that the server. I think it's about 30 or 50 servers To to be used as the CDN server at the time Sorry So for after after dispatch we we we have a capacity up to about 500 gigabits and the next Episode fifth episode. Yeah, we we see 450 gigabit capacity so So this program make a new new record every episode and every episode at the six episode You see the 600 gigabit per second. Yeah, you see that the graph is not it's not capital So it's it's not reaching the top limit yet But but the number is very very surprising because it like double double double at every week so after the sixth episode we We the things it kind of like the recurring tax we expand it because now we have kind of like almost Unlimited track space in almost all the internet services provider in Thailand. So we expand Every week and until we run out of money or something something like that We buy a lot of server every week and put in every every Every in the ISP as much as we can so after the sixth Episode we we we beginning to to get about seven hundred gigabit per second Capacity to to cover this event Okay, and because we have the time constraint of the presentation I also I would like to fast forward to the last episode because after that the the capacity is about the same The thing is that I think we already cover Maybe all of the possible we were in Thailand already The people who who should watch the TV program already come and watch it It's about the maximum capacity that that we have is about Another time is about eight hundred and eighty. I think take a bit But but the maximum the real maximum load that that really happened is about seven hundred something seven hundred and ten per second, which is about seven seven fifty thousand Concurrent we were at the same time the same program for about I Usually if it's a marketing session, I will say almost 800,000 we were at the same time seeing the same thing which is actually a lot a whole lot of the the people and The thing is that we survived we the our system is not Down for a long time. Yeah, I would admit that it in some episode we Be down for a bit. We can quickly come coming back up So for how did we survive this? tsunami of Viewer, okay. First thing is that we do a lot of monitoring The picture showed that The number of metrics that we are monitoring our server is about about a thousand per per server Currently we have about almost 200 200,000 metrics being monitored all the time. I mean usually it will be every 60 seconds Yeah, we we do a lot of monitoring even even before the event so when when something happened we know very very quickly that what what is the the bottleneck of the the system and we can Get in and quickly fix it and the second thing is that because we use open source most of the open source software Usually can expose some kind of metrics To to to to the monitoring system outside Yeah, and we use and and we we create a lot of custom script to to Converted all those information and put it into our monitoring system our for the record our monitoring system mainly based on fabric fabric and We also monitor the the health of each of the service not just the things like CPU memory network that that All that are also being monitored But we also monitor the health of each of the service on each of the edge server On every server in our system. So if one thing one server down we can quickly Failure to to the other and we can quickly zero in and bring out that server and Replace it because at a time we have to use some of the second hand server for it went up and down a lot and and The next thing is that because we use open source We we do not have to care much about the compliance hardware. We do not have to get the list of the Hardware that that compliance with all the software that that we are running because we we we build it ourselves It can run anything it can run on any anything that is x86 64 and and we at that time we use a All kind of windows Super micro there is Anything that that we can have at a time even even no brand is okay the actually the The only the only thing that that that we really need to care about if the the brand of the net of the network card because the it really Affect a lot of performance at the time I think only Intel card can achieve the very high packet rate In the single server. Yeah, and we use view because we mainly use they've been They've been they've been distro at that time if they've been nighting Yeah, so so we just need to make sure that the hardware compatible with they've been Which is actually most of the hardware already compatible with Linux and any Linux not just a bit. Yeah, and We although we we bought a lot of kind of the servers in Into the same pool we we separated into groups and do not make too many groups So if something happened with within one group, we can quickly Snatch it out and replace it with something so so even if you can you can buy anything just do not buy too many things enough that you can manage and and the last thing is that we We know our platform because we build everything based on the open source and and We even build the things like for example engine egg We use engine egg as the main main main HTTP server to serve the content We build the binary version of the engine egg by our cell because we pass a lot of things into the engine egg And we have our own repositories So so we hack a lot into the things that we think that is very important And we do the tuning before we scale out In in my opinion we if we we frantically scale out it will cause a lot at the time But we try to tune to make the one server to cover most of the as much squeeze all the Performance that you can achieve within the server first Before scaling out so we just only need about 30 to 50 I think the exact number is about 40 so both more to accommodate the loads and the last thing is that we You need to focus on the majority of the of the users first at the time we We first Trying to to maintain that we were at least we were can see the content But not HD that that is good enough for them and The second thing is that they can see the streaming smoothly that is enough But actually in the back end all the analytics Accounting everything is broken at the time of the TV program We don't really care much about that because we know that well The log file and everything is still is still intact We can we can rerun the batch the everything after that so So the thing is that you need to focus on the important thing first and let the other things fail at At the at the crisis time and we can we can we can call it up after the event so that that is that is end of my presentation so If there is some any question you can you can come and ask me I think maybe already done all of the time already Thank you very much