 What a lovely crowd and thank you Victor for the talk. I'm actually going to build on that fantastic presentation and a very familiar story for many of us, I'm sure. Just a quick show of hands. How many of you would be online for PagerDuty if your server went down right now? How many of you would get a call? Probably or would be a person who would get a call at some point when you're on rotation, just so that I only to few of you. Just another quick question. How many of you are on something like a VPS provider or a bare metal service or just using compute from a supposed clouds provider but not really auto scaling? Well, a few of you do that. This talk is not really about, it's going to build on something Victor presented. But it's going to be something about if you're a person who's trying to make these decisions, it's going into when we're right in the midst of it and it's going to go into a bit of meta engineering. What do you need to think about when you're trying to do this? Rather than, so it's going to be part of our story, but I'm going to present what you need to think about to do it. So let's dive into it. Before we just go in a little bit about me, I'm Omkiran, I've been at Wiki since 2013. Right now, I'm the director of engineering and my job is to figure out how to scale the team. I'm not on social media, so you can reach me on that email if you want to. And Wiki, for those who don't know, is a VOD service that specializes in fan subtitling. We take Asian content to people around the world and you could watch a Korean drama because it's subtitled in English and it's subtitled by our fans. And why am I presenting this talk? Because we're right in the middle of doing this transition as we speak. One of the key engineers here who's doing that. So this is a story of what we had to think about when we were doing this. Okay, so the good, right? I'm gonna present the bottom line up front. Like this is what we feel is generally the conclusion and then I'm gonna show you how we reach this conclusion. So we're a small team. A backend team right now is about five people. Historically, similar to Vita, there was only me, then there was another engineer and then it became three and then four and five. So we're a very small team and we feel moving to the cloud is good for a small team because you don't have to manage everything. It frees us up to do smarter applications rather than managing infrastructure and that's primarily the intent. And it also couples well with a change in engineering thought when you're trying to do that transition. You wanna change the way your engineers work, the way you develop, the way you deploy and all the things that go with it. The bad or it can be perceived in such a way is you have to give up control at some layers and you should feel comfortable with that and I think that's actually a good thing. It can only be perceived as bad and you definitely need to think about cost. It's a startup, you do. Even if you're not a startup, I think it's good for every engineer no matter how large or small an organization you are and you need to think about cost because money is the lifeline that keeps the organization flowing and in the cloud it's very easy to get carried away and just throw a hundred servers. So you have to think about it how you wanna manage it. What I would call the ugly is if you're not mentally aligned to make these changes and it's cross team then it can really hurt. So it's also about getting everybody on board that you wanna do this. It's not just about doing it. Hey, this is a cool technology because people have to change their habits. When you're changing providers, when you're changing technology, you have to also change certain habits that are built into engineering thought process. So, story of Wiki infrastructure, right? We started off very small. Heroku, we were on AWS because Heroku was a very expensive Heroku. We moved to AWS. Very quickly moved onto bare metal on OVH, SoftLayer. Actually there's a third service provider. I can't even remember right now the name because we had only a few services there. And then finally, 2014, we were on SoftLayer and right now we're moving to GCP. So we've gone through a few cycles, as you can see. The teams learned a few lessons and we feel at this stage, moving to something like GCP is definitely the right answer for a few years to come. And in fact, I don't see why it should not last us for a very long time. If you wanted to think about it, why did we do this in the first place? Why did we not go to the cloud with so many transitions? Well, the first answer is that it worked for us and everything fits in a box. If everything fits in a box, you don't probably need to build something distributed. I mean, you think about it. Not everybody is a great distributed engineering competent person. We have software engineers, but distributed engineering is a different scale and you need to start building that up if you're building something distributed, right? And as you move to the cloud, you need to think a little bit about that. But if your data is not that big and it's gonna fit in a server, do you really need to distribute it? And depending on the scale that you're on, you might not need to. The traffic was predictable. We knew what was happening. So we could take actions and work with it and that may be a situation and it worked for us actually. We had bare metal for a very simple reason as we're growing money, it was way cheaper. Right from 2010 to now, cloud, the cost has fallen, but back then definitely just price point wise, it was way cheaper to do that. This is very critical. The small team that was willing to do this felt they could do it. They felt competent enough to do it. So we said, yes, let's go ahead and do this and it's cheaper, it's having as money and we can deliver this, right? But we did have issues. If you see around similar stories to many organizations that do this, you don't have auto scaling, you're slightly overprovisioned in some cases, sometimes underprovisioned, right? And you need to rush for changes, right? Keeping things repeatable had become a challenge because when you have bare metal and this is before dockers became the rage and this is a challenge because you're building on people's habits again, if you give them an option to go into a box, SSH and change something, at some point they're gonna take it. When it's your 50th pager duty in 10 days and it's the night, two o'clock, it's just gonna SSH into the box and fix whatever you wanna fix and go to sleep. It's going to happen, no matter how much you tell yourself. And you have to become an expert in everything because now you need to understand how the load balancer works, you need to understand how the database works, you need to understand how Redis works and if you had to get the last bit of performance out on the existing box, you need to start understanding you can't possibly be an expert at everything but you had to because that's your only choice. We do, you know, you might not think of about it but Victor did point it out. It's not giving you competitive advantage. What's the competitive advantage to my organization if I learn, I don't know, Postgres very well, unless my job is around, you know, huge databases and I really need to optimize on it. Let somebody else be an expert on Postgres. I'm just gonna use the database, right? Interesting thing to think about even as, you know, if any of you are software managers or leading teams or leads, software does get designed around what is available to it. Engineers will start thinking in those terms, right? So if you have in fine a certain way, you start thinking in a certain way because you don't have some choices. So you start making compromises, you start building it. It should be the other way around. It really should, the infrastructure support what you need on software. But if you can't move fast on infrastructure, that's how you start thinking and over time that builds up. That thought process also builds up within the team, right? However, through, you know, you've seen the history over time, we didn't really have, like I said, we have a very small team. We need to figure it out. So we, you know, made ourselves go through it, go through the pain, and we see a list of things that we did. We were one of the very early adopters of Docker. We were actually Docker in production, even when Docker was beta, and it helped us a lot, right? So, you know, don't recommend using beta services in production, but I think at that point we really didn't have a choice. And we did a few things. We used Ansible for provisioning, you know, just made some things better because remember we're still using bare metal. So we had a provision server. We decided, you know, figure out, oh, HAProxy is a lovely load balancer. Let's use that, right? It helped us in many things. We improved our monitoring, alerting, and things around it, things got better. A very underrated skill, we wrote better software. You know, that's a very good way to reduce your pager duties and think about, you know, sleeping better at night. So, you know, we decided to write software that could handle some of the failures and that also improved things over time. We got to a point where, you know, currently we are at 99.9x, I wouldn't say 99, but on a 98, 97, 96, what is nine something, depending on what timeline you're looking at, you're pretty much there, right? Purves at, you know, 50 milliseconds, 100 milliseconds response time across the board, reliability is up there, pager duties are down. So what really gives, right? Why do we want to move out of this? We've kind of actually figured out how to work with bare metal, right? And we're working well with it. So I'm not going to go into the story of why we chose Google Cloud. We're doing that, so there are many reasons why you could choose any cloud. And this story could apply to many clouds, but we did choose Google Cloud. And if I could just put it in a simple way, not going technical, but slightly meta, most clouds feel like they're written for the developer's manager. Google Cloud feels like it's written for the developer. So it's, if you just think about how the services are grouped and the way it's presented and the way you have to handle it, that was the feel, and I did evaluate a few of them, and we decided to go for it. So I don't go into, hey, this is better or that is better, but ultimately we made that choice and therefore the story was going to say, hey, why we move into Google Cloud and why we're trying to do that, right? So as you were doing that, you know, what did we uncover, right? This was the lesson. And the first thing that you want to think about is the network. And I think John mentioned that they have one of the best networks and I kind of agree with them. That's probably the strength, right? And when you're writing apps, think about it. Everything is the network. You're sending monitoring data to an external service, UDP packets, but it's network. You're sending logs to a external service, the network. You know, take whatever you want from that list. If your response is called to other back ends, access to a DB, right, it's all network. And you don't generally think about it when you're quickly writing up a fast, you know, post, get, delete, route in a controller in an application to deploy it. But you're pretty much setting up another entry point into the network in different ways. And when you have this network on a bare metal service, a VPS service that's optimized around certain things, you start realizing, I mean, your network's a little bit mediocre. It's not necessarily the best. And when you do need to do migrations, you need to think about that. So you need to think about every aspect of that network and how you're gonna deal with it when you wanna do that migration. And when you wanna pay for a good network, you stare at that whole set of operations very closely to try and analyze. What are you doing here? And we figure out, we're being inefficient in certain directions while trying to do this network, while trying to do the GCP migration. Oh, we're sending too much data here and that's gonna hurt us in a better network like GCP because of pricing. But if we can easily optimize it, now get the quality and the pricing down. So it's one of the things we had to learn, like, hey, every part of the system actually talks to the network and it gets better if you are on something like GCP. What comes after network? You get compute, you get CPU, right? One way to think about it is cores get more expensive over time on bare metal because you've bought this box, it's there, but better cores are coming out. However, you're still on that box. Now the price you're paying per compute is actually higher because there are cheaper cores or the cores are the same price with better performance but you don't have the time to migrate them over and you're stuck with this box less four years old and at the same price, you could have gotten a much better core. So technically you're just paying more for each core as time goes by. Another way to look at it is, your utilization could be high but you've not necessarily thought about throughput in your applications and you don't have the setup to think about it in that form. So again, you need to start looking at your applications, your setup with these things in mind that, okay, these are things that we'll get, these are the benefits we might get if we do a migration to something like GCP. One of the key points that I put over there is you can, you know, if your server is gonna take time to provision, you might choose to optimize, right? And say, who's gonna put a new server? I'll just optimize this out. You might choose to take some shortcuts out and this is all due to the pressures of time. People need to, you know, that next feature out, there's somebody waiting for it and software developers will start taking thoughts and they start using, like I said before, they start using the infrastructure as a limitation and work around it rather than the other way around where it's the software that's the limitation of the infrastructure, so you just have to keep building more and more. Now another one of the things, you know, related to the point that I just mentioned, not everything becomes scalable even though you planned it originally. So you may think your application scalable but you never really put it on scalable infrastructure. So if you never really put it on an autoscale infrastructure, now you do, it might not work because you never tested it out in that form and slowly but surely some cohesion or something was added into your application that did not make it horizontally scalable. So you need to think about that. When you're planning a migration, you need to start thinking about, hey, is everything really horizontally scalable? It's not obvious sometimes but when you try doing it, you will start hitting bottlenecks. And I've already mentioned the point of software gets designed around the infrastructure. Test is another interesting point, right? Again, you have this setup, it's in the form that it is. If you're doing performance tests or if you're doing any kind of tests, it's again built around the infrastructure that you have. And again, engineers want to ship something, they're gonna start thinking, oh, but it's always on that box. So I'll test it with just that box in mind and that might hurt you later when you try to move it out of that supposed box or you try to make it actually horizontally scalable. There are a few meta points that, when you want to migrate, you really need to dig into your applications, your setup and think through. One of those, I just wanted to put the other way around. Like what do you need to hide from a GCP-like platform? Well, you don't really need to hide but you need to really think through this. You can do a lot, a lot more on GCP than you could do on bare metal. So your operations team needs to be really, really disciplined to make sure they don't run away with it in directions that you can't bring it back into control. Why? You do not want to reach a stage where just because you have something, you're using it. Make sure the story's cohesive, there are reasons why you're using stuff. Cost control, the biggest thing here is, yes, it's nice to use something new and shiny on a platform that's going to auto scale and work well. Have you learned it well enough to understand how your software will fail in that environment, how your software will perform in that environment? So any new tool or technology that you bring in, you do need to spend some time practicing it, learning it before you decide to migrate because now everything that you've learned in the past with the setup that you had is in a way voided. You need to relearn those things because while it has eliminated a lot of things for you, there'll be something new that it'll introduce. So if you think about, hey, we no longer have reboots on the server, great, it's good, right? But there might be something else you need to think about because the performance characteristics might be different. The way you imagined, on a bare metal, you might just open up all the ports behind the VPC and let services talk to each other. Here you need to configure and there might be suddenly some port that you did not configure and something doesn't work correctly. So it's again about habit, you know, how you think about your deployment pipeline and all of the things that go with it. It's all new. Make sure you understand what you're doing and practice. And that's what I mean by your operations team really needs to be disciplined to get that rolling and make it smooth for all. A critical component here would make sure that their staging production is isolated, right? Especially for cost control and tracking. Like I mentioned earlier, it's very easy to get carried away with it. Let's, you know, we'll click off a button, you can roll out a complete parallel production. It's possible, right? But you need to think about it. So that's just, you know, when you're using something like GCP, do think about it. It's finally your money on the line too. So is GCP the answer? I like to say yes and no, right? It depends on the state you are in. If you're not thinking through these, wait for some time, get the answers to these questions, then it becomes the answer, right? So it's not about, hey, we're just on bare metal, everyone says GCP is the next cool thing. Let's do it. You need to think through these meta engineering questions, prepare yourself, and then the answer is yes, right? So make sure your team understands that you can translate. And I've put that point that I mentioned earlier. It just feels a little more developer friendly when I use GCP compared to other cloud services. And one of the key things that I really liked about GCP was I had multiple layers of abstraction available. If any of you have used it, Google App Engine is a brilliant piece of engineering. And if you're just putting up, something like a basic Rails app that has to talk to a database and serve some responses and not too much of intelligence, because just starting out and something, there's no reason not to use App Engine and just roll it out. And then you have the ability to roll back that abstraction going through container engine, or even if you wanted finer grain control, you could go to VMs as well, right? And, but that's your call. You could start with all the way through App Engine. And if you've not tried out App Engine, I recommend you do that once to understand what I'm talking about. So at this point, we're right in the middle of what we call a lift and shift, which is take our existing setup and just move it over to Google Cloud and treat Google Cloud VMs as bare metal servers so that we don't have any new stuff that we're introducing except for the underlying cloud and its access and everybody gets used to how to use it and set up their development PCs and everything that goes with it. We've started a few minor projects on the side using App Engine and we found it very useful. And as we speak, you know, we're actually planning a migration to GKE to take our entire containerized system that sits in software onto Kubernetes and take it to the next level. I think that's pretty much it. That's all I had to share. So this is just the basic thought process of what you need to think about if you're planning a migration to GCP. Questions are more than welcome. We have any questions? I'm Akur. I just wanted to know how this works because network is very important to you very actively because streaming data and all of that. So how do you handle that entire amount of data going out of your servers because no matter what streaming cloud servers you use will still be a 10 GB GD or something? Yeah, so streaming, we actually use a CDN. We don't put it on our own servers. Can you give you some adaptive victory on it? Yeah, so there's a lot of video stuff that we do but we ultimately use a CDN to deliver the content. And we are, I mean, Google network is almost like a CDN as well. So at some point we'll start looking at that as well. But yeah, we don't serve it from our... Yeah, so definitely we need to... I mean, most video providers do... Pardon me? How do you manage that? Yeah, that's a good question. So most video providers will use multiple CDNs, right? And so do we. Costs also get managed through video encoding algorithms. So we have people working on our video encoding algorithms itself to decrease our streaming continuously while keeping the quality the same. We work on trying to make sure that caching on the CDN server is as high as possible. So it's a function of, ultimately, if you have to stream 20 hours of video, you need to stream 20 hours of video. There's no way out of it. So the only thing you can do is really keep optimizing the encoding algorithms to get the bitrate as low as possible for the same quality that you want to deliver. And from a network perspective, we just use multiple CDN providers. And at some point, after we're done with our lift and shift, we'll also be looking at using Google, because there are areas of the world where some CDN providers are very good. Like somebody may be very powerful in North America, but the South American network might suck and vice versa. So you do not want to stick yourself to one. So from a video stream perspective, you do want to use what's best for a user. And then, of course, you have a cost to ROI ratio and that keeps moving. So you need to do that math and keep switching that all the time. Great. Are there questions? OK, we'll be around to answer your questions. Thank you. Thank you, I'm OK.