 Aloha everyone. So kind of just to get started here. I'm Redbeard. Some of you know me by the end of this. A bunch of you will know more about me than you want to know and we will move on from there. So who am I? That's me. That's how you can harass me. I am a bit of a weirdo who does a lot of stuff at home and like gets real weird at CoreOS. But more importantly, we have got some broken things that we need to deal with. And the broken things kind of cover a little bit of an area that I'm going to rabble about for a while. So there's three main things that we're going to talk about. Number one is hardware. Number two is going to be software. And number three is going to be the human element. And for our purposes, you know, I find this a bit fascinating because all of this does actually touch user space. And I don't just mean us as users, but user space in terms of systems as a whole. So to kind of continue on with the idea, let's start with section one. The problem with hardware. So I figure we'll talk through each of these in order of the easiest to fix to the hardest to fix. Ironically, hardware may be in the strictest terms the hardest of things in the corporeal sense, but it's actually going to be the least amount of work to be done, whereas fixing us humans is going to be the most amount of work. So in the idea of this, you know, Moore's Law has allowed us to cheat for a very, very long time. You know, Moore's Law being the idea that the number of transistors is allowed to double every two years. It's actually David House from Intel who said that, you know, it was 18 months and he was referring to computing, not transistors. But earlier this year, IBM actually demonstrated a proof of concept of a five nanometer manufacturing process. That being said, a single silicon atom is still only 0.2 nanometers. So, you know, you're quickly running out of usable space there. And soon we're going to need other materials. And yes, I realize that, you know, at the scale of 14 nanometers, we're not talking about wire bonding. But in this idea, you know, we've been toying with computers as this thing that we replace in reality to fix problems with software. You know, both the combination of replacing computers because if we just wait a little bit longer, we'll be able to fix, you know, the amount of memory that this thing needs or, oh, well, we're running Java. So we're just going to keep waiting until we can get more RAM and replace things that way. But we are hitting a little bit of a point because, you know, where does all of the actual silicon go with all of this? I do promise you, there is a thesis that links this together. So what we do with all of this is we stuff it in the cloud. And I used to run infrastructure at CoreOS. And one of the guys who worked for me brought up a very interesting point. And that was before working at CoreOS, he had never touched a physical server. This is on the infrastructure team. And he had never touched a physical server before. And that made me realize and then that made me verify through interviewing lots and lots of candidates who apply to work on the team that, you know, we have a growing number of infrastructure administrators and experts who have never touched physical machines. And in that sense, the cloud is to blame. Though through this, you know, he kind of coined the term cloud kids numbering himself amongst them. But the reality of anybody who has touched these physical machines gets into, you know, Free Software Foundation Europe and the idea that the cloud is still run by somebody. And that someone is us. So when you think about the physical systems that we're running, you know, we're, in my opinion, hitting a point where this is the controversial thing, computers are fast enough. Really, they're fast enough, despite all of the best attempts of JavaScript developers to, you know, slow down everything on your laptop. You know, if it were not for that, you know, the way that computers are made faster at this point is by putting more cores on. So having more parallelism. And if you're really thinking about parallelism, you also have to think about, like, how much developers understand either threading or how they understand the development of true kind of microservice type applications. And if it is actually a containerized microservice application, it doesn't really matter if it's one machine or 10 machines, because it's probably a network call. And through that, like, we are getting to the point now where our networks are now catching up to where our CPUs got quite some time ago. And that means that, you know, the kind of IPC between machines gets much faster as well. But, you know, the other side of this is that the overall growing, the overall lifetime of physical machines is growing as well. That means that, you know, we are hopefully hitting the point where, like a friend of mine, Bunny, says that, you know, we can have the idea of heirloom laptops. You know, at this point, unless my laptop is stolen or gets run over by something, I don't need a new laptop. And, you know, a bunch of folks, you know, colleagues of mine are the same way. Like, they don't want a new laptop. They've gotten everything tuned and tweaked. And, you know, Crawford's X-250 is exactly the system that he wants to run and has no urge to replace it. So, yes, people like my colleagues are scared of physical machines, which is very, very weird to me. But, you know, I started my career working for ISPs and did management of ATM fabrics for DSL. And, you know, one of the best benefits that I ever had working for these ISPs, you know, one in particular said, you get gratis collocation. We will give you four U of space and a couple of amps of power and an Ethernet port. And the basic rule was, don't fuck it up. And, of course, you know, I did things that fucked that up and, you know, found ways of breaking it. And it was an important, it was such an important part of that experience that that's one of the benefits that we give to employees at CoreOS. Like, you can put a physical server in the kind of, albeit, tiny data center that we have in San Francisco and learn through all of this. Now, on the opposite scale of this, with how, you know, everybody wants to get to, you know, Facebook has done us a solid with the Open Compute Project. The kind of standardization of hardware for building out commodified systems is greatly increasing, but the problem is that it's not honestly useful. That is to say that for mainstream users who are trying to deal with less than a thousand servers, you're probably going to have them spread across a bunch of different environments. You may not control most of those environments. You're going to use facilities like Level 3 and Equinex and things like that. And that means that you probably aren't going to be able to ask them to just install Open Compute Cabinets and everything else. And so, historically, this means that, you know, NetNew hardware was really expensive to build. And, fortunately, the era of KeyCAD, you know, has made this much, much simpler and much, much cheaper and much, much easier because we have things going up higher in the stack that allow users who are not absolute experts in hardware to start to build out those pieces. You know, we're in an era where for a non-intel system, you can do a git clone and render Gerbers and email those Gerbers with Ability Materials off to a manufacturer and you will get a usable system back. You know, we are now in an era where you can have software-defined systems and, depending on your level of knowledge, extend that into the realm of FPGAs and things like that. And in terms of Open Compute, this is kind of the state of the art. When you go and grab the materials that they have, they will tell you the exact size to build, you know, your mezzanine cards or your, you know, card on, system on card type modules to make sure that you can comply with the overall chassis specifications. Whereas, if you just go to, like, the developers in Bulgaria from Alamex, they give you the full schematics and the actual files and everything so that here you can build out a full ARM64 system with four cores and, you know, I have friends that are going out and have built these systems and are making them useful. So the idea is that you are able to stand on the shoulders of these other projects. We're hitting a convergence because the second part of this becomes that, you know, the resulting design paradigm that we have for writing software is now 50,000 monkeys with 50,000 typewriters. Developers love writing software. Developers hate maintaining software. And in this sense, I grew up in punk bands. No, that's not me in the center of that or with the shaved head. Those are other friends of mine, but that's not me. But my friend Tony here is, had a band called Daybreak and he makes lots of jokes, especially about themselves. And one of their songs is actually titled, it's been done a thousand times before, but not by us. You know, it's that idea of, fuck it, who cares if this thing already exists? I'm going to redo it myself. And, you know, sure, it's always more interesting to build your own HTTP framework. And, you know, it's fun to build yet another flag library for command line utilities or, you know, your own flavor of markdown. But in general, I have two rules for this. If there are more than 10 other alternatives, or you're doing it just to learn what you don't know, it's a pet project and you're probably not contributing to the state of our art here. And that's an important thing to really comprehend. And as an example of this, when you put these together, you end up with the worst of both worlds because anybody can start a project. And if you want to do something truly useful, thank you for this stuff, finish one. If you want to build a net new project, pick something that is impactful, that truly improves upon the existing set of features that we have today, that's designed using modern paradigms, and most importantly, leave your ego at the door. Because the biggest piece of open source development is the we. It's the team that all of us are. And yes, it's easy to get bogged down in how much individuals are hated or loved. But that's not why any of this exists. The average Fedora user probably doesn't know who Leonard is. The average Debian user does because they still hate him for system D being injected into that. But as we approach this world of Linux on everything, it's more feasible for all of us to fix and build systems as well as extending everything. But to learn what we don't know, all of us can really start by fixing some of the broken pieces. And to now get into some of the enumeration of broken pieces, let's look at the intersection of this idiocy. So we've talked about hardware that's, I'll show you un-maintained, and we talk about un-maintained software. And we've talked about net new software, which isn't per se implemented correctly. So let's talk about Quanta compute technologies or quanta cloud technologies. Oh, how I love to loathe you. So I've been on a multi year long software war with them. And it all stems from this device right here, the Quanta Mesh T3048-OI8. It is a awesome switch. At this point, you can get these things for less than $1,000. And it is a 48 port 10 gig switch that has six QSFP ports on it. So you know, you need to do 40 gig networking. Cool, it can do it. You have other devices where you can do LACP across those 40 gig links. It can do it. Again, this is under $1,000. So basically under the hood, it's like a more expensive or it's a less expensive version of an actin switch. But at the same time, it's not quite an active an actin switch. So talking about this, the relationship has been a bit involved. I've caught them in GPL kernel violations, which they don't want to do anything with. And you know, I contacted free software foundation who went we're like busy, man. Can't you just talk to like somebody else? And you know, so I talked to the software freedom conservancy and they helped me out a bit. And but again, the most important piece was learning throughout all of this. And seeing that, you know, they deny that certain hardware skews even exist, despite the fact that they made them and you know, a willful ignorance about the things they're building. So as I mentioned, I learned best by breaking things. So let's actually dig into some of these things. So man, this is tiny to see, but hopefully you can make some sense out of it. So I managed to get a Linux shell on the device. And the architecture is really, really fun, because all of the actual network code runs in user space, as we'll see in a moment. And in the idea of a true like system seven type architecture or distributed systems architecture, everything is pretty cleanly isolated overall. So we started out here by seeing, you know, we have a 64 bit x86 range Lee chip from Intel, not really super powerful in terms of a switch. But you know, as long as you've got some other type of specialized networking hardware, you're going to be able to optimize a lot of that with drivers. And, you know, of course, if we've got that, then we're running a kernel. So we take a look and see, oh, it's a 3.9 kernel built in 2015. And it's compiled 32 bit for that 64 bit, carry the one. Okay, well, there's only two gigs of RAM in the machine. So that's not necessarily egregious. But obviously, they didn't necessarily have a full grasp on that. Because, you know, if that was me, I might think, hey, the hardware supports it. And you know, it's been hard to buy an Intel 32 bit chip that you would do anything useful with for many years now, since they really kind of made EMT 64 the basis in about 2007. So, you know, this hardware was built in 2014. So why did you start at 32 bit? But anyway, speaking of that, you've got a it's always easiest to stand on the shoulders of somebody. Like, let's pick on canonical here for a minute. Without Debian, canonical wouldn't exist. You know, without a lot of without Debian, a lot of distros wouldn't exist. You know, at CoreOS, you know, we kind of use Gen2 as an SDK to then build everything. You know, we push eBuilds back upstream and maintain, you know, a few of them. And it's cool because they stood on the shoulders of somebody too. So we've got Fedora 14 being used in 2015. That doesn't, that doesn't really line up. Let's take a straw poll here, since we've got a number of red headers in the room. How out of date on a scale of one to what the fuck is Fedora 14? Yeah, it's much closer to what the fuck. Fedora 14 was released in 2011, if I remember correctly. This hardware, again, wasn't made until years later. And this kind of comes even further after that. But it all starts to align with kind of this idea that like, as you start to open the hood and look further in, it becomes like a box of pain where the more you see, the less you can unsee. And so, you know, the next piece is starting to take a look at the listeners on here. And so, like I said, you know, we've got everything separated into user space and kernel space. So, you know, I'm obviously looking here. And the first thing I see is, you know, oh, well, we've got 22 and 23 listening, oh, in a weird network namespace. But everything is in the same pit namespace. And 22 is not being served by SSHD. Well, that's good. Why use open SSHD when you can write your own? Am I right? So, and then, you know, so I can't really blame them for the UTelnet D piece, because that's where I actually came in. But you know, nothing like having one binary that serves telnet and SSH. And as you kind of dig in a little bit further, it does everything. It's magic. It's what they've kind of built some of the pieces of Quagga into and a whole bunch of other bits. So, it's kind of like it runs one single binary as you go in and look at the process table. But digging in further, you know, it started to make sense that, you know, they then compiled almost every module that they would need in, except for the Broadcom Fastpath bits, which are the first two kernel modules up there. Those are, as you can see, proprietary and tainted. And it's the flags at the end of the line. But as they actually pull those in, you know, those are coming from not, it's open NSF, I believe, not SAI, or the other bit. But that becomes then part of the linchpin here. And that becomes part of the dirtiest piece. You know, we keep going and we see, oh, okay, so they've got, you know, a corresponding ancient G-Lip C. And of course, despite using Fedora, they didn't actually pull in the real RPM. They just used an old version of Busybox and the version of RPM that's built into the link of that. But, you know, they've, so this is just a Frankenstein of a system. And you go and you pull the spec sheet, and you see, you know, the two gigs of RAM that I mentioned, and you know, how they're storing everything on a micro SD card and, but then you notice that free scale chip on the spec sheet. And you go back to this. And that's where, like I said, they deny that certain SKUs even exist. Because you're you're fighting with this and you're going, okay, no, you keep pushing out new software updates. Like I see that there's a new version of the OS for the T3048 LI8. I would like that. They're like, Oh, well, we never manufactured an Intel version of that system. You're like, trust me, you did. You really did. I've got I can send you photos. I can send you the serial number. They're like, hmm, huh? Well, okay, yes, we did manufacture it. But we're still not releasing a software update for that ever again. So the whole fact that like in their version of SSH, it uses Diffie Hellman, sha one group one, which will eventually be vulnerable to log jam. Sorry, you can't patch it. Oh, all of these bits on the software wise, it's proprietary. You don't get that. Sorry, you can't patch it. Oh, all of the actual GPIOs that you need to be able to make the fans work the LEDs work everything else. No, we won't give you the kernel config. You can't have it. Oh, you want any of the bits? It's just a repeated you can't have it. So about the source code, keep in mind, like we're saying, quanta refuses to release updates. And because of those proprietary kernel modules, you're going to need the Broadcom SDK, even if you had any of the sources to make it useful. And guess what? You don't get that Broadcom SDK without signing a non disclosure agreement for it. So what are your options here? You can either run a proprietary OS that will never receive updates, you can violate the NDA or you can hope that you have it as good as you did in 2002. And by that, I mean that the current state of the art there is that with SAI and open NSF and the third one from Broadcom that I can't remember. These are all brilliant mechanisms where you can purchase merchant silicon and then just accept that there's a binary blob that you interface with through a shim driver. And that's what you get only you can't build the blind binary blob without that SDK. So it just becomes this whole snake eating its tail and an inability to actually patch these old systems. And I say old because again, this thing was built in 2014. And it's already reached the end of its useful life from a security perspective. But it can do 40 gig networking. So this leaves me with a little bit of a kind of what the fuck type moment because now I'm in a boat where I have systems that I can't patch and I can't update and I can't use and I want to and it runs Linux and I'm like this far away from it. And it's like this with a lot of network software, you know, for anybody who administers an actual network, you'll know that like, when it comes to authenticating a user into a switch, like, old app is generally not a thing. Like fortunately, Cisco has started to catch up on this and Juniper has started to catch up on this. But like, the thing is tackax plus, and the tackax plus server that everybody uses because there's three different proprietary ones. And there is one open source one is tack plus. And tack plus hasn't gotten updates since 2011. And the majority of code that most people run on that hasn't been touched since 2008. You know, similarly rancid there, like calling a configuration management is a little bit of a misnomer. Because it is just a thing that uses something kind of like Paramiko. Only it's way older. So it uses expect scripts to log into your actual devices and copy out the config and check it into CVS. Yes, CVS. What was that? I believe it's doing pearl under the hood. Now, that being said, with these things, there are better visions of the world. Think about this. If you have your switch running Linux, that means that you have things like Param available to you. But the developers of those systems are not coming from the same world that you do. They're coming from the world of I'm handed an SDK. And I just literally do what the SDK tells me to and the Broadcom SDK outputs this thing that I then do a little bit more customization on. And that's my proprietary network operating system. Or, you know, you're in the realm where you just finally have had it and you can't take it anymore. And you work at Spotify and you work on Napalm. Which Napalm is the kind of Python library for then not having to deal with a majority of this and it renders static configs using things like Ansible and then stands on the shoulders of Ansible SSHing into all of your devices and managing it that way. But, you know, these are areas where when it comes to the management of systems at scale, we've figured this out at least slightly more on the server side. And, you know, there's a growing number of, well, there's one truly free network operating system. And that is open switch. And then there's a number of other mostly free network operating systems like Open Linux. Open network Linux is mainly sponsored by Big Switch. And, you know, then in the realm of the proprietary ones, there's cumulus and things like that. I mean, it's Linux, but you can't get any of the useful sources. It's all kind of isolated out. And this is kind of the Linux network shit show as it is. Like I said, you know, you've got this wind modem type paradigm that Hello. Okay. I have 10 minutes left. So, you know, the ubiquity of merchant silicon is overall causing the performance of commodity network hardware to go up, which is fantastic. You know, as of right now, you can buy a five port, one gigabit Debian router, like truly doing routing for under 50 US dollars. But in the end, like it's still built using merchant silicon and all of these proprietary NDA chips. So, like if you're lucky, you build MIPS packages, and kind of produce your own Debian repository, and then like can apt install updates to your routers that way. I mean, that's what we do at CoreOS. But, you know, it would be nice when you're not necessarily trapped into the realm of having to then fight proprietary software to use these systems that are otherwise open. And like I said, best case, it's open SL and OFDPA. So let's get on to the hardest problem, which is me. So these are kind of vegetarian systems where we want to remove any dependence on meat involved. So in this sense, I am both a prototype and a problem. And our material example here will be that gentleman sitting in the center of the room, which is Alex Crawford. And by the way, long live Monday monkey and the internet's John Bull. But Crawford and I fundamentally disagree on a few small details. And one of the things that we don't disagree on is the fact that I love container Linux, you know, having a distribution where I just have a fresh kernel and an up to date system D and a whole bunch of different flavors of containerization is awesome. But it sucks to deploy. Like it sucks to deploy container Linux. If you are doing just a small number of systems and that is because, you know, it's a thing that Alex and I talk about a lot. But that's it's because of kind of the configuration language and the configuration syntax. And it is extremely powerful. But it is not user friendly at all. And and part of the argument is get smarter. Duh. And I try to say like, man with you know, I'm I live in California. And when I take my medicine, there's only so smart that I can get. So like, we're kind of stuck here. Like, but you know, this argument that Crawford and I continue to have, it's a good thing. And you know, he's definitely kind of done hearing the rabbling from me about it. But it's a good thing because it's an opportunity to rethink how we deploy how we get to play. Oh, man. I'm how we deploy things at scale. And that is to say that we don't necessarily meander enough. We don't spend enough time thinking about how other people use these systems and how we onboard users to the systems. At the heart of this, I'm really lazy. And I get the feeling that y'all are really lazy to in the sense that you know, there are a lot of things that all of us want to do and want to deploy and want to try. And if you make it hard to onboard new users into doing these things, you are automatically isolating and keeping a potential set of really valuable users excluded out of the way down the line. Now, it's funny that I say here at the heart of this, I'm lazy, and I'm feeling y'all are too, because the next slide here happened a little while ago as I was going through and revamping a bunch of my slides, where everything crashed and said that it's successfully recovered. And this is what I had. So it's a good thing that I keep notes here. And I guess that I'm running out of time. But we need to stop trying to fix truly bad software like OpenStack. I don't think anybody here is necessarily arguing for running things on top of OpenStack. But you know, the only way to kind of write good software is through a combination of making mistakes and by running software. And fortunately, as both as many of us age and as the age of overall computing kind of continues on, we have nothing but more wisdom about this. And it gives us a chance to find things like IPMIs and proprietary firmwares on IPMIs, which need to go away. And fortunately, there's an answer through things like OpenBMC. And, you know, logging architectures, where things like, you know, syslog, it's awesome that, you know, folks built RELP on top of our syslog, but like, that's really not necessarily the best answer. And if you're already using the system D journal, then guess what, everything's already broken out into great fields for you. But as like the kind of state of the art in logging infrastructure is elk, where once you have lots of log messages, you then need to figure out how to become an elastic search administrator so that you can then reindex things. You know, I am excited to see how Facebook makes our lives better when they actually release log device here coming up. And, you know, in terms of the fun, the deployment underlying the deployment of fundamental systems underneath the hood, like it, you shouldn't be in the state where step one is take the system that you have purchased and now overwrite the ROM with a ROM on the card that supports iPixie, just so that you can get these machines deployed without having to rely on TFTP. And, you know, things like EFI are improving that, but I'm not necessarily stoked on standing up here and shouting that EFI is, you know, making the deployment of our systems better. Overall, automation is going to be a core requirement for these things for the future. And, you know, having open BMC and having better centralized logging and having TFTP or removing TFTP is going to make that faster and more stable and easier to manage. But there are a lot of other pieces that we still need to build and improve upon, like inventory systems. And, you know, the kind of last piece that I want to finish with here is the idea of 1%. And I like a lot of folks who I'm surrounded by love to, you know, kind of shit on the idea of code schools and, you know, spend 12 weeks, be a hacker, learn JavaScript, and go out and make 120 grand a year. Like that's not really showing necessarily a dedication to any art. I have many friends who go through those things because they are the dirtbag punks that I was showing you earlier. And they're just trying to make a cash grab. But there's something to be learned here from looking at the idea of 1% and then looking at hardware manufacturing in Shenzhen. Because if you have 100 million factory workers, and 1% of them learn to become technicians who actually have a deeper understanding of the things that they're manufacturing. And then 1% of those 1000 or 100,000, you know, million factory workers or technicians go on to take that a step further and learn to become hardware designers. And, you know, 1% of those 10,000 hardware designers go on to actually build successful hardware companies out of that. And that means that we have 100 new successful hardware companies in the world, which having 100 successful large scale hardware companies is going to be a revolution in terms of driving the overall industry forward. But we need the same thing to happen on the side of individuals understanding systems programming and low level systems. And one way to do that is by onboarding and capturing the fascination of folks who were going through these code school programs. And so now, with that, I have one minute, so I think that there's enough time to take one question and then I have to drop the mic. So from that, who wants to heckle me here? It feels good to be this good. You just stand up here and baffle people with bullshit for 45 minutes. Okay, well with that, thank you very much.