 All right, so this is 30 million requests in an hour in the cloud. My name is Terry Ryan. I'm a developer advocate for Google. This talk is pretty much a case study in me doing incredibly dumb things and learning from them, which is the best thing you can do when you do really dumb things. And I'll also warn very Google Cloud-centric. Now the lessons that I learned from it, I think, can be brought outside of that environment, but just I want to make it very up-front that I will be talking a lot about the platform on which I work. So that's me. I want to try to do this with the lights, but I want to know who you guys are. So you guys are mostly, I would assume, PHP developers so that we can get that out of the way. Anybody here not a developer? Anybody here like a full-time systems person? No? Okay. Anybody here kind of plays a DevOps role, like you do both, the systems? All right. Do you do it because you love DevOps or do you do it because your company hates you? And we'll pay enough people. All right. See how many people here have ever touched App Engine or any part of Google Cloud? Let's go App Engine and then any part of Google Cloud platform. Okay. All right, good. So I'll be explaining some things. Oh, anybody else here using Firebase? All right. Cool. All right. So I will have to explain those pieces a little bit more in depth. All right. So let me introduce this by talking about what I was trying to do. I had joined Google about two years ago, and when I started, I started playing around with all our technology and one of the pieces is this thing called App Engine and trying to figure out why I use this versus VMs because App Engine is what we call a platform as a service. You just take code and put it up and it runs. So why would I use this versus a VM? The answer is, well, it scales infinitely and immediately. Really? That sounds like marketing talk to me. What does it really do? No, no. It scales really, really quickly, and it does it as much as you need. So I started building a demo to test this out. Well, let me see how fast does it actually go. So I'm going to skip the visual and actually just run the demo. So here is the demo. This is it working. This is me after I fixed it, after I broke stuff. But basically, don't hit any of those URLs yet because the whole point of this is that I can do this without prewarming it. If you hit the URL, it'll prewarm it. But basically what I'm going to do is I'm going to run a command. And I'm using make files to do this because me typing all this would be crazy. And what I'm going to do is I'm going to launch, was it, 20 VMs or 50 VMs. So 50 of our VM tools, technology. So we like to say they spin up in tens of seconds, meaning when you're in a hotel room testing to make sure everything works, it spins up in 10 seconds. If you're in front of people, it spins up in 40. And I don't know why. I don't know how it knows. But it does. So I'm spinning up some VMs, and they're going to pop on the screen soon. I basically just ran like a command to say start these VMs that already exist. What's on the VMs is Apache Bench. So Apache Bench, anybody here use Apache Bench? You basically use it for load testing. You can send a lot of load at a particular URL, and you can see how it's performing and kind of see a whole bunch of information. So all these VMs that are going to launch in a second, and you'll see them start popping up on the screen, have Apache Bench on them. What I'm going to do is use Apache Bench to send a whole bunch of load at App Engine. And basically, I'll see how fast App Engine spins up. And we're starting to see our VMs are starting to launch up. So when they launch, they send a little note to this app that they're up and running. And we see we've got one more. Come on. Last one. I said 50. Which one is we waiting on? It knows. It knows. I'll just refresh. Which one is we're waiting on? We are waiting on 40. I don't know which one we're waiting on. Well, that's embarrassing. Oh. We're waiting on 44. Let me just make sure nothing erred. Well, everything should be running. All right. Come on. Oh, there we go. All right. There we go. All right. There we go. 44 loaded. All right. So now I have 50 VMs. So what I'm going to do is use them to send a whole bunch of load at App Engine. And we saw how fast the VMs loaded up. It was reasonably fast for spinning up a VM. But the claim with App Engine is immediate, right? So what I'm going to do is I'm going to type a command here. Take load C equals 20 and equals 200. What is this? So what I'm saying is Apache Bench takes two criteria. How many concurrent connections can it run? In which case, each node is going to run 20 concurrent connections. And I want them to throw 200 requests at App Engine. So all total, it's 10,000 requests. And we should see how fast it spins up. Now it takes a little bit of time for the commands to get to Compute Engine. But when the VMs start firing load at App Engine, they'll start bouncing blue. And then we'll see just how fast App Engine is loading up. In the meantime, I'm going to show you this page right here where there is nothing. There's nothing in the storage area. I'll show why. All right. And now we see that load is coming. So we see how quick App Engine is able to respond to that load, right? It's spinning up instances and basically once an instance is handling more than it can handle, it's full, it spins up another load to spread out the load among other App Engine instances. So we see we're spinning up and we're headed towards 10,000, nodes are starting to drop off, and we should hit 10,000 and stop. No, of course not. Because why would you work? Live demos, like with cloud, is like acting with children or animals. You never know what's going to happen. So what I'm going to do is I'm going to run this one more time. Just for my own clarity here. And what are these other things? These other things are, we're doing two things here. App Engine is writing to Firebase to tell it that it got a request, and it's also writing to storage to write files. The file storage takes a little bit longer, and that's why I showed you that blank page where there was nothing, because I wanted to show you that I hadn't spun up any of this before, obviously, because it's acting flaky in front of you. But basically we're doing two things, writing to Firebase and writing to storage, so that I have a record of every single request that's come in. So we're going to go to 10,000 this time, it's going to hit 10,000 and stop perfectly. Oh, come on, son of a bitch. All right. Oh, there we go. All right. Woo! All right. God, I'm going crazy. So just to show nothing up my sleeve, in addition to it writing to Firebase to tell you that I had those requests, I also wrote files. So each, if I refresh here, we'll see that I have all of these folders. I'm sorry, you're not missing much, they're just really long hex strings. But basically each one of those folders is an instance of App Engine, and each one of these files is an individual request. So I have an actual file record of every single request that came through. We can count off that it would be 10,000 plus 9,900, whatever we didn't go to in the first one. All right. So that's the demo. And I think it does a good job of showing, like, wow, App Engine really, like as soon as request comes in, it starts firing up instances, and it's able to do that. So let me explain how this was constructed and then tell you where I went wrong the first time I ran this, when I was building this. So App Engine does two things. It writes to Firebase so that we can see the request come in, and it also writes the storage so that we have an artifact of the request other than just trusting Firebase that we actually did write a request. And so App Engine is doing those two things, and it also runs the visualizer, but that's just HTML, JavaScript, CSS, it's not really doing a lot of work, it's just statically serving it. Dude Engine is running the low generator, so Apache Bench and remote command execution so that I can do it all from one terminal command. Firebase is handling the real-time display, and it's also the database, the repository of all the data of all the requests that come in. Cloud storage is acting as a proof of activity, right? So it's backing up Firebase, I have two ends of determining whether or not a request was handled. And one of the other things it does is it does object notification. So when a file is written, it fires off a request to say, ultimately, to Firebase to show that a file came in, and that's what, if we look at the visualizer, that's what this stuff is all down here. All right, that's really important, that's going to play a big role. So object notification, we got a new file. So then App Engine takes those notifications from cloud storage, and there's a separate app that handles writing them to Firebase so they can see them. All right, so this is my planned architecture, right? We have the main app, lowdemoappspot.com. Stuff would come in, App Engine would write to Firebase, and would write to cloud storage. And then cloud storage writes to a second app slash storage, which then writes to Firebase, and then all of this gets passed back and forth to the visualizer in real time. When I was writing this demo, and working on it, and struggling with it, I made a mistake, and I forgot this step, and I accidentally had this happen. So we follow the logic, a request would come in, would write to cloud storage, cloud storage would send a notification back to the original URL for the app, and then we would get this going, right? So every request spawned another request. And so it started going that way for a while, and I couldn't figure out, like, why is the counter still going up? Does it make any sense? Like, the load is done. So the requests are kind of running out of control, so it's like, OK, well, like, when in doubt, just delete everything and start over. But when you delete from cloud storage, it also sends an object notification. So I deleted which created a request to send, which created a file, which was then eligible for deletion, which then got deleted. And so I created this runaway positive feedback loop, and I had no idea what the hell was going on. And it wasn't until my quota kicked in. And I worked for Google, so my quota on Google was kind of high. When the dust settled, I had blown through. At that point, I believe it was $100 quota in an hour on all the stuff. And when it was all done, I counted up all the files we created, and it was 30 million files. Now, keep in mind that 30 million successful requests, there are probably a lot more that hit App Engine but bomb for sending to Firebase or sending to cloud storage. But I had successfully served up 30 million requests in an hour. For kind of scope purposes, what do you call a website that gets 30 million requests in an hour? Wikipedia. So I had sent a Wikipedia's worth of traffic to my app in an hour. And when this all sort of the dust settled from this, I was like, oh, god, I was new to Google. Someone's going to come looking for me. And some of my coworkers have this bad. We have this for our profiles inside when you're trying to look up other people that work at Google. We have little badges that say they've been here five years. They were here when this happened, those sorts of things. And one of them is paged by SRE. SRE are the people that keep Google running. They're the operators at what you call it most other places. And if you get a paged by SRE badge, it's kind of a double-edged sword, right? Like, you did something that threatened Google. And like, that's bad. But you also did something that threatened Google. That's pretty cool. That's hard to do. So I was like, oh, I'm going to get the badge. And then nobody ever found me. Like, nobody ever, like, no one cared. So at this point, like, no one cares? Well, then, like, I should try to do that and, like, actually meet it. So I then, so what I'm going to talk through now is taking this demo from I accidentally sent 30 million requests to, how could I make this app elegantly and efficiently handle 30 million requests an hour on a regular basis? So that's what I want to take you through and talk about why this app falls down when we start getting to high levels of traffic and where we can fix it and how we ultimately got it to be able to handle that. So we're wrapping up. So we do the math on 30 million requests per hour, divided by 60, divided by 60. And we get 8,300 QPS queries per second. QPS is a metric we use pretty often at Google. It's kind of getting used more and more often. But QPS is sort of like, it's the speed, you know, the, I can't even think of the word. But it is the metric we use to sort of how traffic is something. So to put this in perspective, we say publicly that we handle over 2.5 billion searches a month. Right? No, a day. 2.5 billion searches a day. That's much less impressive if it's per month. So you divide it by 24, divided by 60, we get Google, search on Google. We publicly state, I think this is low for now. I think this is old numbers, 29,000 queries per second. OK. App Engine itself, the whole product, we say handles 100 billion requests per month. When you do the math on that, approximately 30 days per month, by 24 or 60, 60, you get 38,000 QPS. Now it becomes really clear why nobody came and yelled at me. Like 8,500 QPS, 8,300 QPS, the grand scheme of things is not, it's less than, it's just over a quarter of this. I know how we spec things out and how much, how we handle capacity, and this was probably not anything that was threatening App Engine as a whole. If it had continued for a long period of time, I probably would have gotten talked to, but a spike like that doesn't really register. OK. All right, so it's pretty clear that I can do this, and no one's going to come looking for me, and so let's kind of pursue this. Let's ramp the demo up to 8,300 QPS. So let's do that, and let's see it break intentionally. So make load 8,300 QPS. I know when you have 50 servers, 8,500 QPS, which is just a little bit higher, is 170. I'm sorry if this is hard to see. I'll bump it up a little bit. And if I did 200, it would just flash. So I need a higher number here. So I'm going to go to 1,000 per. So now we should have 50,000 requests at 8,500 QPS. So I'm going to warn you right now, it's going to bomb. It's going to bomb in one of two ways. And so let's see which way it bombs. Let me reset this so we can see. It bombs in one of two ways. Either it's going to go higher or it's going to go low. The low is probably what's going to happen. As I hit certain limits in quotas, it's going to just stop sending requests. And so requests are going to fail. You're not going to see that. Under rare circumstances, it goes over. And the reason why it goes over is that to deal with some of those failure requests, I have a, well, if it failed, just try it again. And that will sometimes, the failure was written even though it was successful. And so we get double hits. That's most likely not going to happen. It's going to go, I'm going to try to get back some of my credibility here. And say it's going to go to around 27,000 and just bomb. Just die. So let me get a drink and stall while we're waiting to get there. So somewhere between 25 and 30 is, I think, where it's going to bomb. But what's cool is you can see App Engine is just rolling through these requests. It's handling them until it breaks. Let's see. And all of our instances are handling more than 100 requests at a time, it's doing well. Let's see, when does this start dying? Wow, when you want it to break, it doesn't, right? It just keeps going. All right, we're getting to where I think it's going to go. All right, well, I'm waiting for this. Any questions about anything I've said so far, if you've got? Pricing. What's that? Pricing. Pricing. I'll talk about pricing a little bit at the end. I don't like to talk about price because I'm not a sales guy. But I will talk a little bit about how much it costs to run this. Oh, come on. So now I'm a liar, right? It's going to 30. OK, come on. All right. So yes, like acting with children or animals. So I'm going to check back on this in a minute when I start talking about some of the other things I was going to talk about. All right, so we will eventually, let me just see if it's bombed out yet. No. No. I hate you app engine. Why do you do this to me? All right, we see the Compute Engine nodes are starting to drop off. There we go. All right, so maybe it'll hit 40,000. There we go. OK, all right, good. So they're starting to drop, and they are dropping much shy of the target. OK, good. I'm sorry. Like it would go all the way, and then I'd be, there's one guy still running, but we're at 36,000 requests. So something has changed, and it's better, but it's still not 50,000, which is where we were expecting to get. All right, that one guy isn't going to finish. So if we look at our logs, we'll start to see these errors in my request. This is the console where you look at App Engine logs. Actually, all Google Cloud logs go through one central log thing. And we'll see here repeatedly, call to URL Fetch failed, call to URL Fetch failed. And that clearly is the source of our problem. What is URL Fetch? So App Engine, because of its architecture, it's not quite. It is container-like under the covers, but it's not Docker, like don't think that at all. It is like containers, and so it's networking gets kind of complex. So we actually have a service that handles delivering URL calls between App Engine and any external resource, and it's called URL Fetch. And most of your service calls are going to go through it. If we look at my source code for this app, basically I construct some JSON, and then I send the JSON using a patch and a post. And let's see, where's the other one? And so I do a patch, a post, and then a file put contents. So I do a patch and a post to Firebase, and then I do a file put contents to our storage engine. If I look at the code, if I look at the source for patch and post, it's just curl under the covers. It's nothing too exotic. It's just using curl. And if we look at the documentation for App Engine on curl, one of the things that points out is that the basic curl implementation is something called curl light. And it uses URL Fetch under the covers. OK, so my calls using that are definitely using URL Fetch, even though I never wrote URL Fetch. That's where that's happening. And then if I look at the SDK for cloud storage under the covers, that's using URL Fetch as well. OK, so I have three calls to URL Fetch. If I look at the URL Fetch limits, I am limited to 660,000 calls per minute, which is a lot. That seems like a lot, right? We do the math on that, and we come out to 11,000 QPS. So I should be fine at 11,000 QPS, except for what I'm missing here, which is I'm making three calls to URL Fetch with each request. So really, that limits me to 3,600 QPS. So right there, I have a limit. I cannot, with this architecture, take this app above 36,000 QPS, the app, the architecture, the platform itself will reject it. I actually contacted our internal support and said, hey, is there anybody to get this limit raised? And they sort of laughed at me, like, no. We're not raising that limit for your demo. Call us when you're Snapchat. So yeah, I'm not going to get this limit raised. So that is the total block that I'm up against. But there's another block in there, too, which is that Firebase, I've noticed from looking at comparing the Firebase numbers and the file storage numbers, I can see that I have, with that demo, if I get the QPS high enough, I'll have 10,000 files written, but less than 10,000 records in Firebase. So through experimentation, I figured out that Firebase gets flaky around 3,500 QPS. And since I have two Firebase calls per page, I really can't go a lot higher than 1,750 QPS on the front end, because the front end gets 1,700 requests per second. It's going to spawn 3,500 requests for Firebase, and that's sort of the limit. So OK. And this isn't publicly documented, and I'm not making public claims about Firebase. This is my experience with it. So I'm not knocking Firebase. This is just what I've been able to do through the REST API. It's OK. So I approached a fix for it. I'm not going to go too deep into this fix, because it was a total waste of time when I started running through it. One of the things you can do is you can switch curl on PHP on App Engine from using URL Fetch to using Sockets, and then you have a different set of quotas, which are much higher. The problem is that you don't, like the whole reason you go to Sockets is you have more control over the conversation, and you don't get that through the Curl Implementation App Engine. So I ended up with the implementation I have, if I went to a certain QPS, it would always work, and then after that QPS, it would bomb. With the Sockets, it was great. It would always be 80% of what I needed, which was great. It was predictable, but still not really that useful. And then so I had that thought of, well, maybe I could just rewrite HTTP implementation in Sockets. I could just do that. And then I thought about a self-apedectomy, because that would be more enjoyable than rewriting HTTP in Sockets by myself. So I was like, all right, well, let's put this on hold. Let's go a different route with it. So let's talk about Fix2, which was switching to Memcache. One of the assumptions that I kind of built into this was, does it really need to be real-time? I thought it needed to be real-time, but does it really need to be real-time in order to track that I'm handling all these requests? And quite honestly, it doesn't. It doesn't really need to be real-time. All I just need to be able to prove is that I was able to handle all these requests and do the work I wanted to do. OK, so looking at Memcache on App Engine, looking at our quotas, you'll see that my maximum set operations is 5,000 when the item size is less than kilobyte. But you notice this goes down. Sorry, yeah, your maximum operations goes down as the size goes up. And so I thought, well, maybe if all I'm doing is incrementing, like increment, Memcache increment is really, really small, like over the wire. So maybe I might be able to get this maximum set operations to be higher than 5,000. So I tested that. So it actually does. I can actually get it to run. So my first idea was to store the instance request count. So each instance is going to write to Memcache the way it wrote to Firebase. Nothing crazy. And then to show it, to visualize it, I would just get a list of all the keys and display the request counts. Pretty easy, pretty basic. This did not work. Why? Well, if we look at the documentation for Memcache on App Engine, stubbed functions in the Memcache API, getAllKeys is right there. Why is it stubbed? The best answer I can find internally was that, well, you're not supposed to use Memcache like a database, and you're using Memcache like a database when you do getAllKeys. So we just figured no. The other reason, which I think is probably more realistic, is that Memcache on App Engine runs in two ways. One is a shared, where you share a Memcache instance with a whole lot of other App Engine apps. You can say, I want my own private one, and you should, but it costs money, whereas the other one is just free and available for you. There's no real security concerns. We do block all that off, but I imagine this was a security concern waiting to happen. So you shouldn't do it, so why should we implement it? So I came up with a different solution, which was basically I get the instance and the request. So the instance is the name of the App Engine instance, and I say increment that key. Now, what's kind of cool is if you increment, and the key doesn't exist, it's going to return a 1, because it initializes it. It doesn't bomb. You don't have to initialize it and then set it to a higher value. It'll just set it to 1. So if I get 1 back, I know it's the first time that instance has been called, so I just append it to a list of instances. And then if that's the first time the list of instances has been created, I just create, then initialize just the list of instances. And then file put contents just like I did before. It was close, but didn't quite work. I was losing instances. And when I kind of looked through all of it, do you remember those really long strings in the storage? Those are the instance IDs. So when you start generating like 1,000 of them, all those strings add up to be more than the limit for how much you can store in one key in memcache. It was the default as a meg. So I had to split it up. So I changed just a little bit. I made this which instance list to divide it into 16 because it's hexadecimal. So the last digit of the instance ID and created 16 lists of that. I'm not going to demo it again. Actually, no, I am going to demo it, sorry. Otherwise, it'll kind of be pointless. So I'm going to switch over to this new way of doing it. This is really small, and I'll pump it up in a second. But basically, this is how I'm going to visualize it. The real-time visualizer will not work. Also, now it's huge. Let me reset this. And is that one still running? OK, good. So it's down. So here is how we're going to visualize it now. We're going to count all the requests that come through. And let me do that. So same load count, load.cache. So I just changed to using the cache version of this instead of using the Firebase version of this. It's going to run. And then if we look here, these will still start up. Now they're huge. So we'll definitely see them start up. These will still start up and bounce. But the real-time visualization of App Engine isn't going to work. OK, so they're starting up and bouncing. So if I look here, and I look at Visualize, we start seeing that I have instances and I can see all the request counts for all of them. And we scroll all the way down. And we're getting up there. Now we're in the middle of this, so this shouldn't be done yet. And those will eventually even out. Let me see. One of them's done already. And the great thing about this is not having to send all that traffic to Firebase. This usually finishes a lot faster. There we go. And we should be done. So we look here, hit refresh. And down to the bottom, there you go. OK. So total count is 50,000. The calculated count is 49,908. And you'll notice that the total instances correct is 1127. And then we lost an instance in counting this all up. And that's why we're at 1126. So long story short, I'm still dropping some from Memcache. But if I catch every single one that goes through, I basically have a running total count and I have a total count of all the instances as they get logged. That's where these numbers come from. That's where discrepancies. So I actively got all 50,000 records. And I saw it in there. So good. This works. Inclusion. So I was able to throw 50,000 records, 50,000 requests at App Engine. It handled it. I was able to count them all up. I have all the requests. Good, we're done, right? It's finished. I'm really going to give up real time. That easy? You're going to let me? No, we don't need to do real time. That's fine. So let's bring back the real time. This version works. It's able to handle all 50,000 records. It's able to handle that type of scale and that type of speed. But it is not real time. So how do we bring back real time? So here's how it currently works, right? The visualizer is still getting cloud storage notifications, but it's not getting from Memcache. So what I need to do is get the data from Memcache somehow into Firebase. So I wrote a thing called real time PHP that reads all the data from Memcache and writes it all to Firebase. So that's probably the easiest way to do it. So first pass initial code is get a list of the instances, get a list of the number of requests for each instance, and then send that data for each request to Firebase, which is a terrible idea. Because I'm still sending 50,000 requests to Firebase at that high QPS is still going to bomb. So I need to approach this a different way. So one of the first things, you have to understand how Firebase works under the covers. So Firebase is a JSON store, basically, a hierarchical JSON store of keys. So here I have App Engine, I have instance one and all the requests under it, and then instance two and all the requests under it, instance three and all the requests under it. And this is sort of how it looks in JSON, right? Now, one of the things I can do is to grab any part of my database, all I have to do is say App Engine instance one request two dot JSON. And I can get post or patch. So I can update it, I can put a record there, I can pull the record out. If I want to get all of instance one, well, all I have to do is just write to App Engine instance one JSON, I can get post patch to it. And then if I want to get to everything, I can get post and patch the whole thing. So what I can do here is since I can not just get it and get the whole data, but I can actually write to the whole thing. So if I create the JSON in PHP and shove it into Firebase, the whole thing, I can do that. So instead of having to write individual keys for each request, I can just analyze the whole thing, create the JSON, shove it into Firebase in one shot. Okay, that actually works, that's great. So here's the trace logging for doing that. And just doing that, like I did not optimize, it would take about five seconds, 500, 5,000 milliseconds to do this, right? And this is our trace visualization. You see I have a crap ton of memcache requests followed by this giant URL fetch to send it to Firebase. So this is unoptimized, let's make this better. Well, first way of making it better is batch the memcache calls using getmulti. So I create the entire list, the entire set of stuff I need from memcache, send it to memcache and get back just one giant request. And then also looking at the data structure I was sending, for some reason, the initial way I wrote this, I would write the instance name, that's the really long string up here, and then I would request one, request two, request three, request four, and also included the instance string, right? Which I already had, so I was just sending gigantic amounts of JSON over at Firebase. So I rewrote it to just send the instance ID and the number of requested handles, and that's all. That made, that really shrunk the JSON down, and so now, here's what real-time PHP looks like, it was running at five seconds before, now it's taking a second. I got two quick memcache calls to get the keys out, and then a URL fetch that was taking around 600 milliseconds. This is a lot better, but it's still not enough, it's still not fast enough for me to be able to do any sort of simulated real-time. So I really got to get this URL fetch down lower. So looking at the documentation for the REST APIs for Firebase, a successful request will be indicated by 200, followed by the response will contain the data written. So I do all this work, I generate all this JSON, and I send it to Firebase, and Firebase is like, have it back, which is really inefficient. I don't need to know, I just sent you that, why do I need to know it? And I'm sure there are reasons for it, but you also can turn this off. So I can say, when I send this through, print silent, don't send me any requests, don't send me a response back, just take it and do it. And when you do that, I now get the URL fetch time down to 200 milliseconds, which means the whole request takes around 300 milliseconds. Now this is something I can do multiple times a second in order to drive a real-time-like interface. So I use our task system, which I'll fire up in a second to show you, but I basically say, at any given time, you can be running three of these operations and run no more than 10 a second, and that sort of keeps my QPS really down low. My QPS will never get higher than 10 queries per second to send this stuff over to Firebase, but I can run it multiple times a second so that it does happen quicker than we can tell. To do this is really easy. I just push to the task queue, I push real-time PHP to a task queue, and then one thing I had to watch for is that if I sent it to the task queue every single request, it'd be 50,000 requests waiting in the task queue, which, when they're being throttled like that, would go long past the last request that I would need. So what I did is I basically just built a throttling thing. As we get further into the number of requests, don't put it in the task queue quite as often. So when you do this, what this ended up doing is instead of writing a lot of really fat requests to Firebase, I replaced that all with these very much fewer light requests to Firebase. So I'm going to fire this up and let's see how it's going. So here's my task queue, and I had it paused. So we see that all 50,000 of those requests that I did before only generated 1,100 tasks. So quite a reduction in the number of tasks we're going to do. I'm going to delete all of these, because... I'm not going to run this in real time. There we go. So all those are gone. I'm going to resume the queue, so now it will process any of those that come through. We'll go back here, we'll reset. I'm going to make this small again, and then we're going to run cached again. Fingers crossed that the demo gods have been appeased by all the other problems, and hopefully we will see this run in real time to run load. Ah, there we go. So it's a lot stutterier than the real time version, because it's still a pull version, but we will be able to track it, and in theory this will go all the way to 50,000. Like I said, it's not quite... It's clearly not as smooth as the previous version, but we see we're running through and we're running the 30,000 requests. What do we get to last time? 37? We're now past that. We're at 40, 42. Should get to 50,000 and stop. Come on. Boom! So we successfully visualized it using a much different way, and it worked, and it worked perfect. Oh, thank you. I was sweating that one. So now the real conclusion. Safety quotas are not something just to be gotten around. Those are one of the first things I learned. There's a reason why that limit was set at 50,000, or I'm sorry, why those QPS limits were there, and why they rejected my request to get it raised. You can cause other resources pain, right? In that case, I was sending a tremendous amount of load at Firebase and causing them problems. I started this before Firebase. We had acquired Firebase, but we were still sort of in the merging together, so I had to contact their support to get my Firebase instance unscrewed up, because I was able to delete everything on the cloud side, but the Firebase side, deleting just wasn't working. How many keys did you write? About 30 million. Actually, probably 60 million. We're just going to delete your whole thing and start over. Is that okay? Because there's no other way to do what you want to do. So I was causing stress on Firebase, and so when you're dealing with that much, when you're dealing with that much throughput on the front, you're thinking of yourself, right? You're thinking of, like, how can I handle this? We're going to make sure any of the services that you're calling, you're not also causing problems, too. Obviously, bundle up data calls doesn't make sense. It is much more efficient to bundle up these calls to giants from a data perspective, much bigger pieces of data, but using much many less connections. And be prepared to make compromises, right? Like I, at some point, had to give up on the idea of the real-time display of it working perfectly every single time. But once I gave that up, I was able to find solutions that made a lot more sense and were a lot more scalable. The other thing is swap out pieces. So I rewrote just the part that handles the load. I didn't rewrite the visualizer. I didn't rewrite the Firebase architecture. I didn't, you'll just see in a moment, I did rewrite the load yet again in another way, but I didn't rewrite the real-time piece or the piece that grabs stuff out of cache and writes it to the screen. I never rewrote that, even though I swapped out other pieces for load. So like, don't think like, ah, this whole thing is crap. Like, no, just one part of it was crap. Or one part of it wasn't even crap. It was just that it reached its limit of what it could do the way it was written. So keeping in mind to swap out the pieces, not the whole thing, is this time, was it time, like, was the language the wrong choice? And I was at the PHP conference, and I'm about to say like, no, no, I'm not, hold on, listen to me. So App Engine, standard, which is what I'm able to use to scale this fast, supports 5.5 for now. Only 5.5 for now. That's a whole other mess, and if anyone wants to give me drinks, I'm happy to explain why and what we're doing about it. But we do support 7.1 on this thing called App Engine Flexible Environment, which is cool, which is definitely cool, but it doesn't scale as fast as standard. So my, I had the thought like, well, what if I've rewrote it like, this is one of those cases where I'm having performance problems in PHP, maybe this makes sense to take this down to C. Well, you can't take it down to C on App Engine. So one of the other language choices available on App Engine is Go. And Go is compiled and writes, you're running much smaller binaries, maybe that would be faster, maybe that would be more performant. So I want to point this out that both the PHP, everything you saw up till now was only PHP and was running, and it's fine, it works, it does exactly what it's supposed to do. But I rewrote it in Go, and I'm not going to go too much, I'm not going to demo it because it's the same exact thing. But what I found was that the PHP version ran about, averaged about 700 instances. We saw before there I had like 1,100 instances, sometimes it was much higher, sometimes it was much lower, taking the average, it's around 700 instances that's been up. So when you spin up an App Engine instance, you are charged for a minimum of 15 minutes. So it'll eventually spin down, but every time you spin up an instance, you're going to hit 15 minutes for each instance. Each instance costs five cents per hour when you do the math, running that whole thing averages $8.75. Now that doesn't mean that you're going to run $8.75 every time you're on App Engine, it's just when you spin up that many instances, the charge is going to be. Most people only run one or two instances, and so the cost per doing something is a lot lower. But in this case, when I'm specifically trying to spin up a lot of instances, I'm going to run it about $8.75 an hour. So I rewrote just the load piece, not all the rest of it, just the load piece, just the piece that takes and generates all of those instances and rewrote it in Go. I had some interesting kind of results. For Go, I averaged about a hundred instances, and this makes sense. If you think about it, the reason why App Engine spins up an instance is because it has all these requests. It says, hey, I can't possibly handle these many requests because I'm still starting up. So spin up more instances to handle and spread them out. Well, Go, you're only writing, you write a very small binary. It's very small to spin up. And so it's able to start processing stuff a lot faster. If we had PHP 7, I think this would be more competitive, but Go being a binary and being compiled, it's still going to have that, well, it's only doing just this small little slice of stuff, whereas the PHP runtime is still bigger than that. So I only spun up about a hundred instances with that. The same minimum 15-minute run, but the charge was higher because there's some sort of issue where I needed to go to a bigger resource footprint on my Go instances. But even with it being twice as resource dependent, I still ran one seventh the number of instances, so my charge was, do all the math a little bit more than a quarter of the chart. Sorry, a little bit more than... 835. I'm bad at math, clearly. More than a quarter, less than a third of the charge as it was for PHP. So this is one of those cases where I could lower my costs by going to a different version, and it was really helpful. So that, I'm going to say thank you guys very much. If you have any questions about the talk, feel free to hit me on Twitter. I'm pretty quick to respond to it. I'm going to open up for questions in a minute, but there's the talk if you want to download it on Bitly, and then my joined-in ID, the talk ID, so please feel free to rate me and criticize me. So with that, I want to say any questions? I'm sorry, I'm in the way. Have you taken a picture of the thing? Any questions? Or you guys all, like, I want lunch, stop talking, let me go to lunch. All right. Well, I will be around. You can usually catch me out smoking. So please feel free to ask me any questions about this, and thank you guys very much for your time.