 So, are we ready, guys, or not yet? So, we're just trying to make the network stable. Sorry about the delay. One more minute, maybe. Okay, guys, so there are two speakers, actually, right now. And the topic is realistic container platform simulations. Hi. Thanks very much for joining us this morning. My name is Jeremy Eater. I work on the performance and scale engineering team at Red Hat. With me is Sebastian Jugh. He's a coworker of mine working to develop some of the test harnesses to drive some of the results for scale testing OpenShift. Our group works on many things. You may have seen some of our talks in the past on different areas of Red Hat's product stack. Today we're going to focus on... There'll be two sections. First is the why. Why realistic testing is important. And the second is the how. There'll be a demo assuming the network doesn't freak out again. And I would like to lead you through some of the value that we hope to glean from the test harnesses. Who here works in quality engineering? So, I'm here specifically to talk with you guys because this is the type of stuff that we are trying to work with our own QE team on OpenShift to instill and pass along our test harnesses to them so that the stuff that they run in CI represents hopefully what our customers are doing with the product or hoping to do with the product. So, today I'm just going to go through a rehash of a Linux con talk called Workload Classification which is a way to decide what platform or what infrastructure to use based on your workload. Some test utilities, a demo, and then I call it gold mining or where we make the money for Red Hat. So, I'm too far from the laptop, maybe. There we go. So, this is kind of our workflow, and well, it's not kind of. It is. We have a bunch of inputs from folks like Cependu and product management or, you know, in the old days as of three months ago. Engineering people, of course, are a marketing team which we're connected pretty closely with the marketing team because some of our exhaust ends up being used in marketing and pre-sales engagements. And, of course, customers which are, like, the lifeblood. We're trying to... We're trying to run... We're trying to be out in front of them. We need to understand what they're trying to do now. We need to understand what they're trying to do a year or three years from now in order to get... Because there's that much lead time in getting stuff fixed upstream or implemented upstream. So, we need that amount of time. We need to be as out front as possible. So, we collect all of those requirements and really try and use feedback from customers and folks like yourself that are using the product in very interesting and ways that we haven't thought of before. Ultimately, what we'll do is in our team... I don't have, like, our logical structure here, but it doesn't mirror the org chart in Red Hat at all. We have folks from... On our Scrum team, we have folks from QE. We have folks who do upstream development on Kubernetes and our performance team, which is currently six or seven or eight engineers. I think it's eight. So, and then on the bottom here, I'm not sure if you're aware, but all of OpenShift works on public Trello boards. So, if you go to Trello.com slash Atomic OpenShift, you will see all of everybody's work. If you go to this short URL, you'll see our scalability board. You can look at what Sebastian's got on his plate. I have nothing on my plate because I just talk a lot. And then there's, you know, the other guys as well. So, you'll see things on there, like what we're planning on doing in the next sprint or our backlog, which says, what do we want to do in this year? Which could be we need to grow by an order of magnitude and scale or, you know, basically whatever this group on the left is telling us needs to be done. Okay. So, back to the realist part of it. First of all, people are asking for, how many pods can I put on a node? How many nodes can I have in my cluster? And I really want, I hope to bring the focus back to what's realistic. It's not so much about how dense you can get with containers versus VMs. It's what is your workload? That's really the first question I ever ask folks when they start asking me questions that I believe they're trying to lead me towards answers and committing to technical decisions before I have any understanding about what the problem is. And I'm saying things like, what is your workload? Maybe I say that three or four times until they finally tell me and then I give them the answer that actually helps them. So things like I want to run 20,000 pods is useful because we need to know what the platform does at that scale or more. But running 20,000 pods is all in production with real workloads is an entirely different deployment than doing 20,000 pods for synthetic purposes. So we're trying to, we start with synthetic for sure there is a pulse on the patient. And we slowly ramp up to the point where the system is crushed under load. We do failure, we do failure scenarios like that. And we're trying to incorporate as many different workloads as we can. So we've got the OpenShift, the example workloads that are packaged with OpenShift that you would see if you go to the deployment in the web UI. You get that nice menu of like PHP or Node or whatever else. We use all of those. We have, those include stacks. So it would be a front end and a back end. And some of them are actually horizontally scalable, believe it or not. What we've done is we've taken those, we've added persistent volumes to them because we're testing end to end. So we're going to make these scenarios as complicated as we possibly can. Because no one's going to run PHP and MySQL on an ephemeral disk. Persistent storage is an absolute prerequisite. Therefore, it is there from day one. So about the workloads themselves. This is where I'm going when I ask folks what their workload is. I don't need to know necessarily it's MySQL. I need to know it's a database that has credit card data in it or something like that which tells me I need security, isolation, I'm on this side, or it's some Monte Carlo simulation, which if it fails, who cares? I pick it up at a checkpoint and start it again. So it's a batch job. Those scenarios helped me tell the customer, we converted these from 16 to 9 about an hour ago. So there might be some weird, yeah. The slides may not look perfect. Mr. OCD over here is probably freaking out right now. But, oh, and it doesn't need locality to its storage platform itself as its storage latency sensitive. Those sorts of things can help me tell the customers what their ideal deployment scenario is. Is this making sense? Head nodding. Now, once we have an app, I propose a way that this app, you can listen to the application. It will tell you how it will run best. Here are the signals. If you've seen me present before, you've seen the food group analogy slide. The point of this slide is to tell you that when either any of your four food groups of computing, every workload can be broken down into these four food groups. If any of those are out of balance, if you have too much sugar in your diet, you're going to be unhealthy. It's important to keep these in balance. If you're bounded on CPU, you're not at optimal performance. You need to throw more cores at it or optimize your application. One of my coworkers did a presentation on Rev. His major problem was IO. We did a ton of work just to overcome the fact that we didn't have the right storage for the workload. You don't want to be in this position if you're a customer. In the lab, it's fine. Now, what we've learned from that experience was how much IO we need to do certain workloads. And so we have these capacity planning formulas now that are beginning to show up in OpenShift's documentation, which are all based on the simulation software that we're about to show you. Now, here's where the business guys come in and reign on everybody's parade, right? So far, we're all having fun. It's all technical stuff, but guess what? Business guys are the budget holders. So if you want to get work done, it's important to listen to the budget holders. Stability, performance. I put performance on here as you can imagine, right? But maybe it's not your primary concern. But for sure, security, stability, and most likely support for an enterprise are factors that weigh on your technology decisions. Here's a worked example of OpenShift Build Service, which we worked on last year. Is Amanda in the room? Amanda Carter is the team lead for the OpenShift Build Service on the technical side. For RHEL, there's now Fedora versions, and I believe CentOS versions of the same build system. We worked to help them increase by, well, we helped them qualify a new horizontally scalable build system. The first system we had was only vertical scalable. It was a nightmare. The situation from a business standpoint was if a CVE drops, we need to have all of our containers that Red Hat builds available within one hour, patched. So in order to do that, we had to optimize the system we had currently, which means replace all the disks with NVMe, first of all. Throw a bunch of memory at it. But in the same time, build a system that's horizontally scalable. What are some of the performance concerns for a build farm or a render farm? These are subjective. It's different based on what you're trying to do within that build farm. But I can say that for the most part, a build form is CPU and memory intensive, disks generally throughput, maybe not network latency or disk latency sensitive, because it's a build farm and it's just spitting out binary artifacts at the end, in our case, containers. And deployment speed. You're churning through a lot of containers. Docker, as you know, during Docker build, spins up however many containers, as lines you have in a Docker file. So deployment speed becomes key. In this scenario, we would love to use OverlayFS, because it's faster to start. There's page cache sharing, et cetera. When we started doing this, we identified a bug in, well, not a bug, but a compliant area of Overlay that we worked around with a plug-in to Yum, which was a dirty hack, but worked. And now we finally have it fixed in the upstream OverlayFS modules. So in REL 74, we won't need that plug-in anymore. The point is, those are the types of optimizations we were trying to do on this build system. We wanted to build an extremely high throughput system. And actually, it's running on AWS at this point. Does this make sense? Does anyone have a workload in mind that we could work, that we could kind of go through together if you're responsible for a production system? So anyway, the LinuxCon talk has five examples, including Gluster, what else did I do? Grid computing. There's five examples. It's on video somewhere on YouTube. So as a performance and scale team we're responsible for, it turns out, basically every subsection of the product has some implication on performance and scale. This is not my slide. This is from the corporate Red Hat deck for OpenShift. But what I've done here is kind of tried to call out which sections we are involved with. And it turns out it's more often than not. We do work on the logging and metrics, and I'll share some performance and scale data that we arrived at in the most recent runs towards the end of the presentation. We also do build parallelism, build and push performance for the registry, along with persistent storage like I mentioned earlier. Security doesn't necessarily turn up in our studies ourselves. Oh, and we also do some network testing. So that's why we have eight different people that work on performance and scale because it's actually, we're kind of divided up and we kind of divide the whole OpenShift problem space amongst those engineers. Who here's installed OpenShift before? Who's installed OpenShift for scale? And it's not on my team. It's different at scale. You have to handle high availability, you have to handle load balancing of API requests, you have to handle nodes, you have to have excess capacity for failover, you have to handle the registry scale out, and there's other implications when you start doing scale that are invisible at smaller scale. What we want to do is make sure that we are ahead of the customers by using our simulation tools such that we encounter the bugs before the customers do. Earlier presentation mentioned an IP tables issue. It's by far OpenShift's biggest scaling limit right now. Until we fix, and it's a Kubernetes issue, not OpenShift. If and until we fix this IP tables issue in Kubernetes, we are artificially limiting ourselves in terms of scale. I'll go into some detailed numbers in a minute. Turn it over to Sebastian at this point. I think these slides are yours? Not yet. You can go through these. I mentioned our workflow earlier. Our job is to be out in front of as many people as we can, basically the developers put code in, and if it's a high value and has some impact to performance and scale, we get our hands on it. Then what we do is we develop test harnesses and pass them to our counterparts in QE who are responsible for the ongoing regression testing. We've got a repository up on GitHub with all of our code and test harnesses in it. Here it is. In there, you'll find some networking and reliability testing which may not be up your alley, but there are two that I would like to point out. Cluster Looter, which Sebastian is going to show you, and there's a component also for workload generation. Right now, we're supporting HTTP. The last link is our attempt to upstream, first of all, convert to Golang. It's in Python now. Convert to Golang and get everything into upstream Kubernetes. Upstream Kubernetes has its test harness called E to E and 10. It's possible that the code could live there. We'll see over the next coming weeks. But right now, that's where it is. And other folks who are interested in performance and scale on Kubernetes are contributing different tests in that repository. Some of the interesting things we encountered in our inventors and scale were, in order to do anything interesting, we had to build everything we possibly could upfront, push out all the expensive operations into an upfront build procedure. If you're familiar, if you've ever done traditional virtualization, you know that you build these templates, right? Template images and you clone everything from there. It's the same thing. We take the rel base images, which are like the cloud config images. So the base product, we wrote some ansible to turn it into something useful for OpenShift. At the end, you have an image of rel that will allow you to do very large installs reliably. That's the important part. Ultimately, what ends up happening, and Oco mentioned an hour ago, was you end up contending on Docker registry resources, on RPM installs, so that's I.O. RPM downloads, network I.O. The whole idea is that's a repetitive task and there's no reason not to bake it into the image. So this is automation that spits out Qcows or Amazon AMIs. Just one second. This gentleman? Because atomic doesn't have system containers yet. When it does, we will use atomic. So inside the SVT repo is a sub directory called Image Provisioner, and this is just ansible. Sorry. The question was, is this available anywhere? The answer is yes. It's inside the SVT repo that I mentioned earlier. We've shared this with the OpenShift product teams. We hope that they will productize it, because any customer who really wants to do anything with OpenShift at a decent scale is going to invent this anyway, so we might as well productize it. So, are there any other questions? So what this thing does is... Okay, there's a trivia question for you, which I don't have a gift, but if anyone can answer this, I would love to know. How can we do the same thing on Amazon where we have a secondary disk attached in a template? On Amazon, this works. On OpenStack, we cannot figure out how to make it work. So we have a base image on OpenStack with an attached disk and be able to clone it. Does anyone know how to do that? I can't find a single person at RedEye who can tell me how to do that. We can do it on Amazon. Because we have that deficiency in OpenStack, we do this first step, which does a bunch of partition resizing, really gross QEMU image commands and guest fish and everything else to move stuff around to create slack space at the end of the partition that allows us to shove device mappers thin pool into. If we could do this... The right way to do this is the way we do it on Amazon, which has a secondary disk. And ultimately, we're just going to use overlay anyway, so I'm kind of not too hot to fix it, but it's really annoying when we use OpenStack. The next piece is we install all of the OpenShift RPMs. So all of the IOs out of the way, all the network stuff's out of the way. RPM transfers. Then we do Docker image pulls of everything we're planning on testing, which includes all the S2I images, all the base images that actually it takes to turn OpenShift on. And then we also have some images we use for testing called the pause pods, and I think there's a... What is the other one called? Work stress or something? We have a synthetic workload pod. We clone some Git repos. We install collectD, as well as our other monitoring and metrics utilities. Something called Pbench we use as a team on the end there. It's a utility that allows us to graph system statistics. We also use it to instrument OpenShift itself by attaching Pprof debugger to it, which gives us like a perf top output for all of the go programs that are running. So we can identify hot functions of OpenShift during any of the tests that we run. Okay, so you're up. Can everyone hear me? Hello. My name is Sebastian Yu. I work with Jeremy Eater on the performance and scale team, and I'm going to talk to you today about cluster loader. So, as you can imagine, working with OpenShift, you might want to come around with some automation to automate the cluster, right? So we have all sorts of components to test, like Jeremy went over before. We have the networking component. We have API server. We have many different individual components that make up OpenShift and make up Kubernetes as one unit. So if you want to deploy an environment with thousands of deployment configs, with services and replication controllers and whatever else you want, thousands of routes, pods, persistent volumes, you may want to come up with a way to automate that. Otherwise, have fun with a million tabs or a bash script or something like that. So we have this tool called cluster loader. Initially it was in Python, as Jeremy mentioned, and this is the basic logical diagram that explains how cluster loader works. Essentially, you have one single user configuration file, and that file is just a squashed together, appended configuration object. Cluster loader will decide what sub-objects exist and create a namespace for each of these projects. And then within that namespace, you can create whatever it is you want, whatever OpenShift or Kubernetes objects exist, like quota, template, service, a pod, et cetera. And then you iterate and create however many you need, move on to the next config object. It's a pretty fundamental generation pattern. However, working with Python is a little bit troublesome, as we were just really using sub-process shell commands. So then you're working with text as opposed to actual objects, and it's really not that ideal. So we moved from Python to Golang. It's much more robust, first of all. Like I said, you can work with the native API, so it gives you infinitely more knowledge into the system. Additionally, to make it upstream, since everything that we have, OpenShift, Kubernetes, is all in Golang, they're not taking any Python. Finally, in order to make it easier to get into the customer's hands and for the customers and clients, whoever uses OpenShift for you guys to play with your systems and how they scale, see if you can find some issues with it, if you have any questions. We want to package that all up with OpenShift and ship that out as well. So essentially now the new Golang cluster loader integrates directly with the API and will spin up whatever you need based on that. All right, so we're going to do the demo now. First off, let's just discuss right now. Since we do consume, OpenShift is based off of Kubernetes heavily, so we decided to base the cluster loader tool off of the Kubernetes end-to-end framework, which is a great place to start. It's the accepted standard for testing right now, for testing OpenShift. It lowers the barrier entry for any new tests or any new things to be written, since you get so much for free. It's really highly extensible as you can just add on more tests and it's already clearly defined. Right now we only support pods and templates. Templates in Kubernetes are a much different concept than in OpenShift. OpenShift has submitted a proposal to extend templates, but right now, similarly to the single user configuration file, it's a similar concept. The templates in OpenShift are just the appended YAML file with all the objects in one file, and then it'll just iterate through them and deploy them. Whereas in Kubernetes templates, they're a little bit more robust, and you can have parameter substitution and it's a little more logical in the deployment style. Sorry, yeah, sorry. OpenShifts are more robust than Kubernetes are more rudimentary, sorry about that. And also pods, in the real world, you don't really necessarily want to deploy pods on their own because you lose a lot of the functionality of OpenShift and Kubernetes. You don't get to define the state of the world and have Kubernetes manage that for you is just a solo container and it's doing its thing out there without any management or without any supervision. But for testing, you don't often want your test to run complete and then have the deployment controller restart. That's not something you really want. So pods are much more useful in test scenarios, so that's why we have implemented them as well. So we'll start with a demo. All right, so to get started, you need to set up your go route and go path and all that and clone the Kubernetes repo. And essentially to bring up a local cluster, this is the command you want to use. Since we are using Linux and we are enforcing because we are using RHEL, you need to have a couple extra commands to allow the security context to be set. So you bring up the cluster and this runs a whole bunch of stuff. It brings up the API server. It generates certs for the API server so everything is secured. Even though you have the option in the test cube, you still have the option to not run a gangsta.htps. So there we go. We have the cluster up. We're running version 1.6 alpha. It's nice and dirty. So this is the first configuration file that we'll take a look at. Above the cluster loader, this is just a regular YAML file. Above the cluster loader definition, we have E2E test embedded configuration parameters. Like if you want to delete the namespace or the project once the test is completed or you want to keep it, we're not going to delete it because we want to take a look at it afterwards. The provider, since we're running a local cluster, we have a local provider. Additionally, that could also work with GCE or whatever cloud providers, though we don't use GCE for the most part. So you have, like I said, each project is the definition of one parsable unit and within that, you have different configurations to be deployed. The base name just gives it the name of the namespace. The tuning defines the tuning set that it's linked to. So if you want to have a ramp up, if you want to define a stepping size, the pause between steps, the timeout between creation or rate limit, if you're working with really large clusters and large deployments, the API server is really, really overwhelmed and you want to slow down the rate deployment. You have that option as well. I realize my mic's off. With OpenShift 2, we have this exact same problem. They had a parallelism of 25 that was stable. It's actually slightly less with Kubernetes than it was with OpenShift 2. But that's what that's for. Also, we can ramp up a certain amount, let it settle, watch the resource utilization, look for memory leaks, continue with the next step, and go all the way up to however many we want. This way, it's a really easy way to identify our breaking points. 10,000 pods, and then when we try to go to the 11,000th, it just falls over, or those aren't specific numbers. But that's part of the reasoning behind the steps. I think we only have a few minutes left. You want to? Okay. So, I'm going to get through here. The files that are defined within the templates or within the pods are just the standard pod file or eml file that you would deploy regularly with Kubernetes or with OpenShift. So those are just the standard templates or whatever you're going to call them that you're deploying. These are configuration part. So that was a beautiful command line command. It essentially specifies the target, which is a local target, what kind of test to focus on, and this is the cluster loader specific run, the configuration file that I just demonstrated. And then it's already created all the namespaces up here and the templates, and then here it created the pods. We wait for them to all be running, and since we didn't delete, they're there. So we have one namespace with all these objects deployed as we had in the config, obviously. This is one of those scenarios where it comes in handy to have all those images pre-fold, because as you know, when you do Docker Run, it would go ahead and grab it. Having all those things locally means we can actually focus on the Kubernetes side and not worry about stressing the network constantly. Yeah. Oh, yeah. So that's already running. The scalability part we're trying to figure out, yeah, sorry, is this automated? The question is, and the answer is that we're working with our QE team to get them to use the Golang version. They use the Python version now. We need to add, it's squirrelly, we can talk later, but right now the Golang version isn't ready to do the Jenkins part yet. So also we're working with the internal teams on how big we can afford to go in a CI type of test. So this is the second configuration file. So we can look at it as we just deployed whatever your environment might be, the end-to-end, the back-end, the front-end, or the database has just been deployed. You deployed however many you needed. And this second file kind of represents the test that you're going to run now. So we have a resource consumer pod, and now we're passing parameters into it. We're running what application we're going to run inside of it, a router IP, a target host, duration, or megabytes. Now in this demo, we're going to run two features, which really are arguably the most important to run a realistic container simulation. And that's the ability to A, give a static container dynamic data so that way you can run something different for each of them, and you can define what it is. Secondly, now that we have the API, we have the ability to on-the-fly not only inject this data using config maps, not any hacky way to do it, we can also know the environmental variables. So now we just deployed our applications. We have all these things running. Are you going to like grab through all their IPs? Find their host names? Whatever dynamic data you might need, we now have access to with the API. So even though there's some random IPs filled in here, in the demo, we'll see that they'll be auto-populated with whatever is actually running on the cluster right now. So we'll go ahead and do that. The idea here is to build these generic workload pods. This is a stress pod for just like compute, or just malloc, right? There's other pods that do HTTP work, other pods that do FIO type of work, or network testing. We have the environment in through Kubernetes native config maps. We have these images that never change. We customize them per run via this YAML file. So you can see here, I have full verbosity enabled. So for each of these endpoints, we have various IPs and ports or whatever they're running on. And then we find configuration values from the config file and replace those with the ones that were generated automatically. So now we have a secondary namespace with one resource consumer pod, that's all we deployed. So now if you take a look at that, our router IP is localhost, and the target hosts are corresponding to whatever we just found as opposed to what we had in the config file. Yeah, so config maps, who's heard of config maps or has used them in OpenShift? Just a small number of people. It's the intended pattern for shoving custom data into an image. So it makes sense to use it for the testing, too. Yeah, every Docker has its own way of doing it. OpenShift has its own way of injecting variables. It's the Kubernetes way, and it's the correct way. We could just manipulate the actual map structure directly and just substitute it in, but this is the actual correct way to test the system versus just hacking it up altogether. There are a variety of other features that we also have that we wouldn't necessarily demonstrate. There's pod state synchronization. Since we know what pods and how many pods are running, we can wait for them to be running, to be terminated, to be in whatever state we need. Additionally, there's synchronization as far as starting tests, so we can wait for all pods to be running, for example. And then start up a web service to trigger, like, a starting gun functionality for tests to start. There's also, like, a couple other features we'll skip over. But up next, we are going to also port this to the OpenShift API while it will work on an OpenShift cluster, because OpenShift does have Kubernetes API running underneath it. It's not testing the OpenShift API, so we will add OpenShift API compatibility. Additionally, we will add the templates and the configuration files for a range of other tests, which I mentioned before, including our networking tests, our DNS tests, the workload generation. So we'll have all these tests integrated into one tool for easy deployment. Yep. Oh, and we're also looking to add tests at any point. So if you have example templates for OpenShift, you can wire them right in here, send them to us, first of all. And then, if you can. And then, yeah, we're trying to get a gigantic collection of these things going. So every customer I talk to gets asked if they can share. They never do. But at least they have the opportunity. Okay. So did you want to just mention the contributing part? Yeah, sure. So as you might see, it's 100% in Golang. If you don't write Golang, you should. GitHub repo, it's under Kubernetes. As it's upstream, it may be being merged into the base repo of Kubernetes IDE at some point, since these are mostly libraries. We have a mailing list. If you write emails, Slack, if you use Slack, and on free note, you can find us on OpenShift Dev. So if you're interested in contributing, please give us a call. Cool. Okay. We'd love to have external contributors or additional, at least if you can ask us, if you can look at it and see what's missing. It seems to cover what we're trying to do so far, but we're always looking to make this better and represent reality. So since I have a few more minutes left with my hostages here, I was going to go through some of our key takeaways from the last strobe of testing that we did. So we did this pretty big strobe for every minor release of OpenShift. We completed the 341, like, a month ago, or a little bit less than that. I call this, in the agenda slide, I call this gold mining, because there's just a tremendous amount of effort for those little nuggets of truth. And those are the ones that we feed back to the development teams that says, that doesn't work at scale, completely refactor, redesign your architecture to make it scale, which has happened. Or we tell the upstream Kubernetes guys that design falls on its face at scale, whatever. We did identify several of them. I just wanted to share with you. All of these were teased out with the harness that Sebastian just detailed. So I think I might have called out IP tables earlier. What happens is every time there's a... So the way that OpenShift in Kubernetes distributes updates about endpoints is a broadcast. So every time you make a change to an object, it synchronizes it cluster-wide, and that causes, believe it or not, a full save and restore on every node of IP tables for every change in an object. In steady state, that still happens. This is steady state. It's burning, and the y-axis here is sent to cores. So 300 is three cores. This is just one node. We have 1,000 of these nodes. So 3,000 cores out of the test cluster is just churning through IP tables. You can see how at scale this is not going to work. So I front-load this because it's the most important thing on our plate. We have all of... Oh, by the way, I should just say that one of the major issues has significant traction upstream or inside Red Hat already. We don't know when fixes are going to come in because some of them do involve complete redesigns, but the IP table's one is not one of those. So we should have a handle on this by 3.5. One of our other gentlemen... I don't know if Yuri's in the audience. He didn't make it today. It's a shame. One of our engineers is based in Bernal Yuri Menchak, and he is involved with testing the routing layer. One of the things we found is that the default for HAProxy is only 2,000 connections. That's okay for a small-scale environment, but you really want to start scaling up here. Of course, you're going to want to increase those connections. So we're changing the default from 2,000 to 20,000. We've got some great documentation on bumping it up all the way, and we're going to re-validate with 1 million connections, et cetera. Another thing we found is... And this is squirrely, and I'm pretty sure this is a bug in HAProxy, but it kind of fell on deaf ears upstream. But essentially, when you fill those 2,000 slots, the stats endpoint, which is what the health check uses in OpenShift, is part of that 2,000 group. It doesn't have a special channel on the side, so when you max out your router, OpenShift will not be able to keep its heartbeat going, the health check that it checks every x seconds, and it will freaking restart the router pod. So that's part of the problem that we faced in earlier versions of OpenShift. As I mentioned, it will be gone in 3, 4, 1, or something. But when you put significant load on the system, you end up just... You've got 2,000 connections that are working fine. I'm heating refresh, everything's working fine. When that health check cycle comes around, that router pod does not respond to its heartbeat check after three consecutive failures, that pod gets shot in the head and restarted. All of your working connections are now gone. Pretty sad. My thought was that HAProxy should handle that. We haven't come to an agreement on how to solve this just yet. So what we have is increasing the number significantly. In fact, the guidance has been, we can probably make this a million and not worry about it, but we want to only put numbers in the docs and in defaults that actually have been tested. So we're in the process of building workload generators, because we actually had to redesign some of the workload generators to be able to push that much traffic. So we're kind of in the process of making a test that can actually break it now. It was pretty easy to break in the first place. Some of the other options we're looking at, when you go above 1,000, so Linux ARP cache currently has a limit of 1,000. By default, if you go above 1,000 nodes, you need to deal with the ARP cache, you need to expand the size of the potential ARC cache table. That's a simple sysctl, but we added documentation for that. And then things like in the future we're going to do, like maybe we'll have multiple router pins, potentially for more scaling, fast open and busy pull or some other optimizations that we might consider looking at in the next rounds of stuff. So there's a talk on logging, and there's a talk on logging in metrics tomorrow afternoon, so I'm not going to spoil the party here, but there's some issues with that that we're working on. So I encourage you to attend the logging in metrics talk if you're here tomorrow, I believe at 1 p.m. And this is another one that got horribly botched in the aspect ratio conversion. But essentially, and this was also mentioned this morning, is super, super sensitive to write latencies, write latencies. And so we actually had to move it from Cep to a local disk in this scale out config. So by default in the future, we're definitely not going to deploy on Cep unless Cep is backed by like NVMe or something like that. We had it backed with rotational disks, it was a bad move. So we learned that one. And this was actually the architecture, separate presentation on dissecting this architecture, OpenShift on OpenStack. We first started off with heat, super cool, didn't scale, fell back to our original deployment methodology, and we can get to, you know, 1,000 nodes pretty easily these days. Our next goals are obviously double-lad, and we have some stretch goals, which are pretty lofty as well. So I mentioned at this, I mentioned at CD. We'd love to have OpenStack designate. Route 53 works fine on Amazon. We have some good tech in our own product stack to allow us to have dynamic DNS stuff. And for now, we've invented our own, which is not cool from a customer standpoint. We use boot from volume in Cep, which was super cool. Let us have a single disk image spread across all of the hypervisors in OpenStack. We learned how to size our Cep pools. There's actually a tool called PGCalc. If you haven't seen it, if you're working with Cep, we had to, we actually did not size it up front, so we had to undergo a very expensive rebalancing act after that took, like, half a day, because we were dealing with 20 terabytes. That sucked. And, oh, we have a new technology in 3.4 where we can use taints and tolerations to steer builds to certain nodes, because Docker build runs this route, fun fact. And so if you're a company who's allowing arbitrary code execution, you probably don't want those builder nodes to be co-located with production. We invented this routing pod steering technique that allows us to declare nodes as build nodes, so Docker will build on those nodes, and then push to registry. When a user goes to start that pod, it will start on one of the production nodes where no builds run. That's a really cool technology for security purposes. In summary, a couple minutes left, in summary, don't forget, workloads are all that matters. All the tech stuff, I just showed you because we're at a technical conference, and the realistic part is all that matters. If you're a QE guy, I beg you to try our cluster loader and feed us workloads if you have them for OpenShift. Questions for Sebastian or myself? Our first candidate for implementation. Oh, sorry. He's asking if we can use it against OpenShift dedicated. The reality is only the older version of cluster loader. The Python one works against OpenShift at all right now. That one works just fine. It works just fine. And our QE team does use it against OpenShift right now. That's how we get all of these numbers I just showed you. But the Golang one, which is where we're going to do all of our development going forward, the older version is kind of in maintenance mode. That's the one we're going to use. We're going to figure out how to make that work on OpenShift. Sorry? So the CI aspect. The reason we're pushing this upstream in the first place is to get the cube guys in our SIG scale to buy into this and put it into their CI. We're going to do everything upstream. This way OpenShift has all of that interference. They run interference for us. They already do a lot of scale testing now. We talked about it at KubeCon. They're on board with adopting this. They're reviewing all of our stuff. And eventually, we will have it in OpenShift CI as well. Like I mentioned earlier, I got some budget challenges to do it. Because it's expensive, man. Yeah, no, we don't. What we know is that the parallel... So the question was, is the Golang version any faster or slower? The answer is that Kube's parallelism is low enough that the test implementation details don't matter. We can pretty much get like 10, right? Was it 10? 8? Well, like, what's the other tool we can crush? Right, right. And we don't want to do that. So we have the steps like the tuning stuff at the bottom. I mean, yeah, we started off with trying to crush it. We know where Docker breaks. We know where Kube breaks. But at a certain point, the developers ask, is this realistic or not? And it really isn't. Handling floods like that is of some value. But again, the argument is how much value it really is. So we're kind of getting away from the total crushing weight failure scenario to something that customers may do, which is kind of a little bit more of a drip feed. The thing we have to kind of weigh is that we do want to start things up quickly, because when you're going to 20,000 pods, it's going to take a while. So we do want to, like, push it right to the edge, but not to failure. So, yeah. Just one more question, then. I'm not sure I understand. So OpenShift does not have automatic node admission yet. It is absolutely on our roadmap, short-term roadmap. You currently have to... So the question was, can you handle, like, the dynamic sizing of the node pool? The answer is OpenShift as a product doesn't support that yet. If you wire it to OpenStack with heat, it actually will. At least create that second node, but it will not join the Kubernetes cluster manually. You still have to run this... There's a scale-up playbook in OpenShift. You got to run that guy. You can run it before or after extending it. It doesn't matter. Yeah. So I think I'm out of time, then. Thanks everybody for coming.