 Welcome. Today, we're going to be talking about training machines to be open source contributors. In fact, we're going to be talking about Cyborg teams. I'm not a Cyborg, but I'm part of a Cyborg team. This is not Hyperbole or sci-fi or any such thing. It is 2017. Some of this stuff should be possible, and it is. But let's take a bit step back and look at the why. Why would we want a Cyborg team? What does it give us? I don't know if you've read the economist Tyler Cohen. He has a good short little book called The Great Stagnation. And in there, he talks about how a lot of the progress that we've had in society over the last 150 years was due to low-hanging fruit such as educating everybody, not having people die at 45 years old, being able to expand into a new continent like the Americas, or better or worse, things like women in the workforce. When you get that many more people, low-hanging fruit that starts coming up, you get lots more productivity. You find the geniuses. You find good ideas. You find people that can connect to each other, spend more time on not just staying alive but doing cool stuff. So all of this happened. And we, on the last while, this is the theory at least, is we've been coasting on this low-hanging fruit, the benefits of it, and the low-hanging fruit is running out. And now we see somewhat in society a stagnation. I tell my grand, I tell my kid that in the days of his grandfather, people walked on the moon. And my kid, Kai, is very depressed that the Concorde doesn't fly anymore. He's like, what the hell? You tell me about this cool stuff that used to happen, but it doesn't happen anymore. Now, in some areas, things are still advancing. But we see this trend, and it's pretty worrisome. But we have one more low-hanging fruit that we haven't taken advantage of, and that's what I'm here to talk about today and has to do with machines. Machines that we can use in all of our, in the way that we work, and especially in our industry, in open source, in order to push productivity, in order to make these massive strides and advances forward. But you might say, machines aren't really doing that. There's another guy, another economist. He was born in, I think, the 20s, but he's still around, and he's very renowned, and he noticed that you can see the computer age everywhere except for in the productivity specifics, except for in its effect on the economy. He's wondering why that was, and there's many answers to that, that the economy needs to change, to the digital economy is fundamentally different than the previous, the way that we modeled the previous economy. But I think there's another reason, more fundamental than that, and that is we're using machines wrong. You'll say, we use machines all over the place, I have a laptop on my lap and I run commands and it does things for me. I use them every single day. Well, back a long, long time ago, in the Industrial Revolution, there was factories set up, there was great advances, and a lot of those great advances came out of the steam engine, providing power, so slaves didn't have to do things, but machines could do things, we could actually build, mass produce things, and society was pushed forward. One thing you notice is that later in the Industrial Revolution, an electric motor came out. It was invention, we saw electric power start in New York, start to connect the city, and the first time that this was brought into a factory, they just took the steam engine, which was a big steam engine in one central place, and replaced it with an electric motor, and thought, hey, this is gonna be much better. But it turns out that the whole factory was set up around the idea of a steam engine. It was set up in such a way that all of the tools that people were using, the assembly line, the workers were set around, if they needed more power, they were closer to the source of power, closer to the steam engine, if they needed less, they were further away. There were these massive line shafts that would run up and down the top of the building in belts, and people would get stuck in there, but no, the machine would keep going because they couldn't stop in it, and so on and so forth. And when you just replace the steam engine with an electric motor, you get marginal benefits, less people dying of coal in their lungs and some things, but you don't get a massive change, massive benefits. It was only when people figured out, no, you're supposed to use small electric motors and put them all over the place in the factory and run cables and change people's schedules so they all have to be there at the same time and turn the factory around the product that they were building rather than the source of power, and then there was massive benefits and massive change. And that took about 20, 30 years between the two in some places, 50 years, depending on how quick people got on it. And we're seeing the same thing with machines and computers. We know how to use them, we're very fluent in them, we work with them, we haven't really made that mindset shift to how we're supposed to use them and how to take advantage of them. So this is what I'm here to talk to you about today, a cyborg team, a team that is part human and part machine. And I'm not here to just theorize and talk about the industrial revolution, I wanna show you proof that this is working. I wanna show you a preview of the future, a scoped down preview, very specific to a certain project and sort of what is possible. And I don't imagine that this preview shows all of the things that are possible and all of the effects, but it shows you the direction that we can go, need to go, and the amazing stuff that's down that road. So I talked about cockpit yesterday and about how it's a Linux session and a web browser, how it interacts with the system directly and all the cool stuff, how you can change it really quickly, easily, it's basically as easy as writing shell scripts in the browser and your browser is part of a Linux login session. But what I didn't talk about is how that was done. I'm gonna skip through some of the slides about cockpit because I assume you've already seen them since you were there yesterday. But one of the interesting slides was that from the web browser, you can, in JavaScript, interact with stuff on the system directly. Without cockpit or knowing anything about the API or the system that you're interacting with, you can basically just interact with the system from the web browser. In this case, change the host name. What happens here is if you follow this kind of train of thinking a little further, you'll realize, whoa, you're interacting with so many parts of a Linux system from a web browser, way too many pieces to be viable. There's so many moving pieces and you might look, wow, those are some really amazing people who have done this work and it's true because here they are. These are the guys who have worked on cockpit. I don't think all at the same time, some of them have moved on to other things but these are the core contributors who've worked on cockpit and these are the human part of that cyborg team that we're going to talk about and these are some amazing and very handsome people, two of which are here. But let's take a look at what they pulled off and see if that is actually possible. So here's the list of APIs on a Linux system that cockpit interacts with. I don't think this is complete, put it together the other day, but it's pretty crazy. That's over 90 different APIs, some of them are Divas APIs that are pretty broad and conclusive like system D or Docker or network manager or Kubernetes or UDISCs, Udev, so on and so forth. Cockpit then turns around and actually delivers in 15 different Linuxes and products, obviously in Fedora, in RHEL, in Ubuntu, in Debian various versions of all of the above, in CentOS 2, in OpenShift, in Rev, and so on. There's different branches that are maintained and this list is growing. Where different patches come in, some are bug fix only or security fixes. Test against different browsers, make sure everything's working, but currently only three. And all of that is done weekly, weekly releases, getting it out, almost quite a few of those Linux distros pick up the weekly releases. And you can imagine when you do the math there, that this is pretty crazy, that number adds up to over a million. This is combinatorial explosion extreme. Every single one of those APIs revs at a different time in a different distro. Everything and the branches have to account for all of that, the way it works with all of the browsers, the way that what's actually delivered to the user is sometimes different than what is in the latest version of a distro. Sometimes they cherry pick certain patches and stuff. And so this is pretty much impossible for that very handsome team earlier on to deliver. And now nothing's happening. There we go. The effort of a solely human team does not scale past a certain complexity point. And we see this a lot in open source. In open source it's really easy to start a project, excuse me, when you have very focused set of code. Thank you. Thanks a lot. And you have a very focused set of code and you're sitting in a dark room late at night hacking away and you're extremely productive. Once the team gets bigger and bigger and you have to interact with more people and more pieces and there's so many moving parts in order to complete a full story, you run into this wall. Some people run into this wall later. The Googles and the Facebooks of our industry can throw enough people at this to move that wall a little bit further out. But especially in open source, we run into this wall really early. Is there something wrong with this, huh? Nice. All right, let's close it. So what we need is Cyborg teams, machines as team members to get past that wall. We're gonna look at one of these Cyborg teams in action, like we said, the cockpit team. And as you look, I invite you to take a look at how things work, how things act. When you look too closely, let's say at a fish, if you cut it open, look at the innards, it's goop. There's little bones and stuff, it's not very impressive. When you see it swimming, it's amazing. It's a life form all of a sudden. It's not just goop and bones. The same is true of the automation that is in a Cyborg team. If you look really closely, it's a shell script. It's a rest call. It's a little bit of glue here and there, and it's not very pretty. But when you take a step back, it is swimming and it's acting as a team member. Here's a list of things that we're gonna look at for how we know that there's a Cyborg team in action in the cockpit project today. Bot's own mundane work. We have pair programming with bots going on, humans training the bots, bots doing machine learning, watching the humans, bots shipping cockpit, bots as committers, and the team stops working without bots. Let's look at the first one. Bots owning mundane work. I'm gonna walk you through a workflow, a very simple mundane workflow. Updating translations in a project. It's a very trivial, menial job that you probably give to the newcomer, the intern or some volunteer who joins your project. And in this case, a bot works on this. And the bot opens up an issue on GitHub saying, hey, the translations haven't been updated in a while. I think it's, right now it's 10 days or something like that. And it'll say, please, I need, you know, someone needs to update the translations. Well, it just so happens that a bot knows how to update the translations, another bot, and it starts actually working on this. You can see it puts work in progress at the top, and it says the bot name actually starts working on this GitHub issue. Now the actual code is something you've seen in a lot of projects where there might be a make command to do a few tasks, to download the PO files from either trans effects or Zanata or so on, upload the translations or the strings that have changed, add it to a checkout. There's more steps here, we need to check, for example, which language is to now show in the UI based on how much translation is complete and so on. But, you know, all of these things are pretty basic, things that you and I can do without a problem. So the bot goes and does this stuff, and you can see it posts its output, its work in progress, just sort of like a human would to GitHub, and it turns this into a pull request. And then, a whole horde of little bots come and test this across everything from rel to Ubuntu, to Debian, with various browsers, and so on. In fact, this particular pull request, if you go and look at this number, 7906, these guys actually found a problem in the translations and said, yeah, that's bogus, the translations are wrong and there was broken strings that wouldn't have compiled, and, you know, they found that problem. And, but by the time, in the best case, by the time a human looks at this, and a human did, right, you looked at it, Martin, in this case, because this was just the other day, a human looks at this, it's like a junior team member did all the work and handed off to you for final approval, like, hey, check it out, can I merge this? And just as a junior team member would, and the net result is that a human does not have to do all that bogus work, he's basically managing and being the lead for a machine. And more on those tests, I have a whole other discussion about this at ContainerCon, but the tests are really aggressive, you saw the crazy amount of tests that went into testing just a translation change, that was insane, in fact, about 1,000 VMs are spooled up in order to do this. And in a given month, the bots, this is mundane work, oftentimes testers or QE, it might be known as, testers of a project, you might farm this out to volunteers, but the bots do this kind of work that's insane. In a given month, it's about half a million VMs on a project that's the size of cockpit. And this is so repetitive that it starts to actually have a massive effect. Let's look at that, one of those effects. And that's pair programming with bots. You've all had this experience that you've tried to freeze or stabilize a project before release. And usually you start to say, hey, no more feature changes, hey, please stop making string changes, please stop, I don't know, screwing around with shit. And then, okay, now we're ready to go, it's stable, let's do it. We saw something very different in the cockpit team when we were getting ready to go into RHEL. RHEL now has a long-term stable life cycle of 10 years. There's an insane amount of time to consider, oh my God, do I have to live with this shit, right? If you look back a year at your own code, it looks like ass, but imagine then having to look back after 10 years, it's pretty depressing, right? So it really puts this weight on your head of, oh, this better be good. But it turns out that people were landing massive thousand patches and changes days before the freeze dev cut deadline. I'm like, how is this possible? Why are you guys doing this? And the answer I get is we're pair programming with the bots. And I'm not making this up like at some higher level, these are literally, this is literally what the team members are saying, Marius was saying this. And basically, as there are coding, whole bunches of the code, handing it off for aggressive testing in all the various weird ways and actually adding stuff in the code that to make it break in weird ways and adding tests in such a way that basically thousands of virtual machines, thousands of boots were being used in order to, while the code was being written in order to find the problems and find out does my idea here work? Does this change work? Does it break every thousand times? Does it break every 10,000 times? And just re-triggering those bots to do the tests over and over again. And this is really what you do when someone's looking over your shoulder, looking at syntax, we do this. Yesterday we coded on screen and everyone had good advice on what mistakes I was making as I was coding and that's the effect of pair programming there and you see it with the bots. Oh, I didn't even get to show my cool slide. Anyway, this is pair programming. So humans training the bots. This is another thing that we see a lot of. Humans are very good at telling, especially us programmers, we're very good at telling other people what to do. And we're really good at telling machines what to do. In fact, this is the source of this being a low-hanging fruit. This is the reason this is a low-hanging fruit is that everyone on the team speaks the language of the machines. You can't say that in any other industry or any of the walk of life. All of us have this innate ability to talk to the machines and tell them what to do. You can look at who has contributed to the code that runs as bots in the project. And you can see it's a significant amount of commits and change across. It looks very, very similar to the other contributions to the project. People are actively involved in training the machines and it approaches the level of, if you had, let's say, a volunteer or an intern joining you and let's use intern as an example, right? You're company, you have an intern and he goes and gets the coffee and you tell him, hey, wait, the coffee on this floor is crap. Go get it from a different floor and you expect that to happen, right? Then and there. And people train the machines in the same way. As expressing, I want you to do this from now on and constantly communicating, as you would to another team member and then the bots go and do it. And we make this very fluid. I'll talk more about this later. Bots learning from humans is another way we know this is happening. We know that this is a cyborg team. We have, this is pretty new actually, and I'll talk more about this at the end of the talk, but we have bots watching what the humans do and learning from their activity and actually starting to factor that into their decisions. So using neural networks to do this, but more on that later. Bots ship cockpit, that weekly release just does not, could not possibly happen the way it does. The amount of work in there, in the weekly release, even if everything goes perfectly and the person just, this is, knows every aspect of the release, would take about two and a half to three days. Before we started releasing into all of those things, it would take about a day to just do a few, release into a few Linuxes manually. And that was only when things went perfectly. Now we've greatly expanded. And what happens in the workflow is a human signs a commit in Git and the bots go and they make tar balls and patches based on when the tag is and then he commits after the tag. They'll update RPM spec files and Debian control files without information. They'll release preview builds, they'll commit stuff into the right repositories, push fedora packages, upload packages into Ubuntu, into Debian, upload tar balls for the source code download, do documentation updates, do container rebuilds, send all this stuff out. And then this is not even what's there for REL, for OpenShift, for Rev, and so on, there's a whole another list of things that happen there. And you can see that an act of basically instructing other team members, I sign this GitHub, I'm the authority, I'm the human. And here's exactly the message, the tagline and the message, the change log and so on, to go and use for this stuff and then the bots go and scurry about and figure out what to do. And often this goes wrong. One of these things will fail and a human will come back and change the way the bots work because someone updated some silly API somewhere and didn't keep it compatible in one of these Linux's. But that's exactly how a team would work. If a junior team member comes back to you and say, well, I tried this, it didn't work, you would tell them, oh, let's fix that, let's fix the tool or maybe you should do it differently. And this is the same thing as, it's essentially the same way a team works in general. Bots are committers in Cockpit. This is pretty weird. But the fourth highest committer to Cockpit, Gitmaster is bots. You can see the Cockpituous name there, that's the bots. They all have the same name. And yeah, you can see that if you take a look at these commits, you'll see they're all pretty bullshit commits, they're just the mundane work. But mundane work takes up your time and life and you're so used to it that you don't even know it. But all of the little packaging tasks or all of the little testing tasks or the pulling updates from people or reacting to bugs and all of those things turn into commits and a lot of the commits we have in our projects are just TDM. And that TDM needs to be handed off to machines and you can see the effect of that right there. And lastly, the team stops without the bots. When you have a team and you lose half of it, your team's pretty screwed. You have to sort of reorganize and figure it out and this is what happens. This is why we have a very distributed and very fail safe kind of architecture and structure for these bots because it's super critical to the team. Team just doesn't really work as a team without it. So that's how we know that we see this in action. We see this in action in a project. Like I said, this is a preview of the future. And because of the amount of things that cockpit touches, those 90 different APIs and so on, it is I would say a valid preview, one that's translatable to other projects. If it was just a simple little web project based on PHP and so on, you might say, well, that's very different than a lot of projects. This doesn't really translate, but this is such a varied project that touches so many things, you can actually take this and apply it elsewhere. And here's the rules that we've come up with over time. It's the kind of rules that when we forget them, we get screwed and we have to remind ourselves of them. The laws of cyborg teams. There are three. The first is that teaching a machine must be as easy as teaching a human. That example earlier of get the coffee from a different floor, that was pretty easy to tell the guy. It needs to be that easy to teach the machines. That is every part of your orchestration, job running, automation, testing and so on must be trivial to contribute to. As easy as contributing to the software project itself, whether that's making a pull request or posting a patch on a mailing list, you apply that same ability. The low hanging fruit that we all speak, the language of the machines, apply that to teaching the machines. And this is where open source really plays. We have this embedded in our culture. We know how to communicate with each other, asynchronously work together, apply that to changing the infrastructure that your bots run on. The first place to start this is by making your tests as easy to contribute to as your software. Even if you have to put them in the same repo, actually that has a lot of benefits, make them just part of your software. The second rule is that the machines must produce feedback into the team's workflow. We see this a lot where people write all sorts of fancy automation, orchestration of tooling, and then they just post the results in a cool dry place and there's zero effect. This is what you might know as Schrodinger's test or a test has neither passed or failed until someone actually looks at the results and reacts to them. You have to put the results of the machines back into the team's workflow, make them part of the team. You saw that example earlier of it filing an issue that followed an issue and then made a pull request and then reviewed it in place all as a human would. In fact, the bots often hand off their works when something goes wrong, especially to a human and a human does one step and is part of the same workflow. The same way of looking at results, the same way of tracking tasks and sharing progress. You even saw that work in progress thing at the top. That's how projects, that's how our project tracks the fact that, hey, I own this. You put a work in progress little string at the top of the title and say that no one else should touch this while the bots do the same thing. And you can train the machines trivially to do things as your team does. And lastly, a human should be able to impersonate a machine and a machine should be able to impersonate a human. Don't give your bots special access rights that only they can do. Given the credentials that they have, you should be able to do the same task using the same tooling. Why? Because you will need to, you will need to go and fix that bot first of all and train it to do something differently in the face of a changing workflow or you'll want to write a new one. This goes back to the four freedoms of free software. You should be able to do all the things that that other software does so you can change it or make a new one. And vice versa, you shouldn't have special rights that the machines don't have given the same credentials. So if I need to write a bot to do something and I can give it appropriate credentials to do that, it should be able to go and do that task. The policy shouldn't prevent that and the technology shouldn't prevent that. And here's the foundation of all this. This is where you start, if you're wondering where to start. Tests are the soul of a robot. Teach machines right and wrong, good and evil. You may have seen examples of machine learning, learning how to play Pac-Man and they give the machine a little control and a video camera of the screen and a test to say that when you hear this sound, it's good and when you hear the sound, it's bad. And that's all. And the machine then learns how to play Pac-Man, how to make the good sound come and not the bad sound. And that happens pretty quickly but it's all predicated on the test. Tests tell the machines in the bots what your team thinks is good and what your team thinks is wrong. And given that ability, you can build massive amounts of capabilities or hand off all sorts of tasks to them. This is the foundation of it all and this is the place to start. So here are some techniques that we also use. These are not the hard bound rules that when you get wrong, you get screwed on this but they're more like things that 60% of the time work every time and things to think about as you do this. Organic and distributed, self-validating bots, self-aware bots, containerized bots. Don't assume bots can't and don't rework process yet. Let's see, how much time do we have left? Not sure, we'll get through all of them so if we run out of time, we'll sort of skip around. Oh yeah, maybe we will. Okay, organic and distributed bots. Before I said, if the bots stop, the team stops. And that's true of any team that loses half of its members. So we've made them, we were burned by this pretty hard. The project stopped when our infrastructure went down. So we decided that's never gonna happen again and we started to make them distributed. They're not orchestrated. So if you look here, you'll see these kind of bot containers going on and they're all independent. They carry their own brain around with them so to speak. They are containers. They have everything that they need in them. And in addition, someone on a laptop, any of you can run this bot container themselves or run any of the tasks inside of it yourself. In fact, sometimes when some of the infrastructure goes down, people will just spool up bots and their laptops and start contributing to this whole thing. It's very, it works like this. Each of the containers goes and looks for tasks. We use GitHub. That's where the team's workflow basically revolves around. So they look for tasks on GitHub. They both, they all know how to look and see what are the open pull requests, what are the ones that need testing, what are the issues that have outstanding tasks that I know about and so on. And they all apply an algorithm. There's no orchestrator to do this. They all go and check and look for tasks. They'll prioritize those tasks, what do I think is most important and what do I think is less important. And typically they'll all have the same idea of what's important and not important. And then they'll take the top 10, randomize them and pick one. And they'll go, I'm gonna go and do this task. And then they start on this task. They start posting their output publicly. Make sure always that your bots post their output publicly. Just like all of you, if you do cool open source work and then don't post it as if you didn't do it, it's the same with the bots, don't forget that. So you post the output publicly, they post them into well-known places. Some of these bots are running behind firewalls and infrastructure that you can't access from anywhere. That doesn't matter, as long as the output is public. The results of the actions. We actually have a setup where we have what we call a sync in different places and these bots can basically just post their output to the sync. The sync will actually go and update GitHub for the bot. But you can set this up, of course, differently. What's interesting is the bots will check to see if anyone else is doing the same task after 60 seconds and so on. We have collision avoidance in the beginning with the randomization of the task, but then the bots will go and check and see, hey, does someone else start on this and whoever won, whoever got a little further, whoever actually has the update on GitHub, the other guys won't be like, eh, we'll pick something else. So they're organic, they kind of converge to the correct state and to a state where all the tasks are done. And they also share their state with each other. At ContainerCon I was gonna talk about how to do this with Kubernetes, but the bots will basically, if they have machine learning data or maybe an image that's created for a certain operating system and so on, they will ask each other, hey, do you have this and like, pull all of the different bots around and figure out, has anyone else prepared this thing, this thing that would take an hour to prepare or something so I can go and do my task? And if not, it'll go and do it. But they share their state with each other and sort of organically work that way. So here's another technique, self-validating bots. I don't have any fancy pictures of this. The basic idea is that once, especially once you have organic and distributed bots like this, what you can do is if someone opens a pull request to change how that bot works, the bot should take on that new behavior. Let's imagine it's the prioritization of which tasks to do first or anything really and should act in that new way. So one of the bots is acting differently. And there's nothing that says they can't all act differently but you typically want them to act the same or be very confusing. But one of them will act differently for a certain pull request and one of them will act differently for another pull request. Pull request that change how the bot works. And then of course, when that's merged into master, all the bots start acting that way. This allows you to iterate really fast and also revert really fast on how your bots behave. Self-aware bots, again, without an orchestrator, the bots need to be self-aware. They need to figure out, where am I in the world? Am I inside, in this case, checking if they're inside the Red Hat VPN in order to get an early preview release of RHEL. But bots can run all over the place. For example, in a container, sometimes we have KVM. Sometimes we don't. This is a whole cool topic to talk about that I could go on about. But the bot will check, can I do these sorts of tasks or not? And we'll actually just figure out, oh, well, those are out of the question. I can just work on these. In some cases, in our infrastructure, actually we'll boot Windows thousands of times a day in Kubernetes. But for regenerating that Windows image, there's a few manual tasks that a human has to do. In theory, the bots think they can do it, but typically there'll be a flag that says, oh, you can't do any Windows stuff or you can only get stuff this close. Or for Windows stuff, you need access to a certain license file. There's a lot of stuff like that. And all of this can be coded into the bot in different places, just as a team member would be aware of the current state of how the team functions. This is something that's actually interesting. I'm gonna go in a bit of a rat hole here. Is that the tooling that the bots use is very consistent. It can be shared, can be open source project, can be given to other people to reuse. But the bot itself, at the top layer, is a member of the team. It embodies all the weird little corner cases and strange things you do in the team. This is an example of that. That part is kind of like the part that makes it a team member. That's the part that can't be shared. So obviously, containerize your bots. Containers are nothing if a way to have a developer place a big pile of crap on their desk and you can then put a shovel under that and get it to everyone else where you need it to go. And this is exactly how the bots need to work. They have all sorts of weird dependencies and strange things and over time they grow with different, oh, I need this library or that library or I need this weird hole. Just, for example, with machine learning or for example, if you want to have translation names for the languages, you need to pull in a bunch of GTK stuff because they do it really well. So you bring that in. I mean, just weird crap in the spot. And containerizing them actually really makes all of that work cleanly. You can have rolling updates and so on. And this is at ContainerCon, I'm talking about this as a whole talk in itself so I won't go into it too much. But we end up in a state, in a place where we have containers that are inside being tested, inside of virtual machines, in ContainerBots that stage how to launch those virtual machines, those virtual operating systems, running on virtual machines, let's say provision virtual machines and so on. And this is not a problem. This works great, it feels weird and wrong, but this is how life looks pretty weird, pretty strange. But when you containerize things, you can do strange things very reliably and repeatedly. All right, don't assume bots can't. There's an interesting paradox, Moravic's paradox, and it doesn't have a catchy little title, it doesn't have a catchy little phrase, but basically it's about the fact that the things you think bots and machines that are, sorry, the things you think are hard are often really easy for bots and machines. We think it's hard to play chess, and it is, to say, well, it's easy. Your phone can play chess better than grandmasters of the world. It's easy for machines to do this, especially now. We think it's easy to walk down the stairs, but it is really, really, really hard for a bot to walk down the stairs. And so when you run into automating stuff and looking at what's possible and what's not, keep in mind that you should try it out first before you use your intuition, because your intuition is often wrong in this regard. And lastly, don't rework process yet. This is really important, and I see this so often when people talk to me on this topic. They're like, oh, I want to automate this, but first I have to solve world hunger. No, I have to fix all these problems here because they totally suck in the way they're doing it, it's totally wrong. Well, it turns out machines are really good at doing bullshit over and over and over and over and over again. And just let them do it. And what that does is it let it, by having the bots do their repetitive, just craft tasks that you're like, oh my God, it frees you up to then make that better. Whereas if you try to do it the other way around, you're digging yourself in a hole that just gets deeper and deeper and deeper, and you block and you're basically forcing the rest of the team to spend their time on mindless tasks that these bots could do and you don't allow them to actually go and spend their time on fixing the real problems underneath. So, oh, this is cool, but I don't have time to talk about it. What we're actually doing, this is where the Cocker Project is, like right on the edge of where we are as far as machines and bots and machine learning. And that is in any CI system, you have flakes. Your tests fail in random, disgusting ways that are super annoying and they have no relation to the change that you just made. And we have a lot of these. So what we'd like to do is people will react to these flakes. Every single time they'll look through the results and be like, that has nothing to do with my pull request and try and retrigger the tests. That's just life, this is what happens. And if it goes green, they're like, yep, totally didn't have to do with my pull request. And usually they're right, not always. And then, or maybe they'll fix something, they'll get that change of code in. We can actually use that as a source of machine learning data to show how, well, the humans reacted in this way for this kind of failure. Therefore, this kind of failure must be unrelated to the specific changes, must be an underlying theme. But in addition, there's a whole other thing here. Do I have any cool slides about this? Oh, yes. And that is that these flakes are actually mutations. A lot of people spend a lot of time fuzzing their software. And also one of the things that you do in machine learning is mutate something and see if it got better or worse. Fuzzing is a form of that, where you change this here and change the input in different random broken ways and see what happens. It turns out that test flakes are really a lot like mutations, mutations in timing, races, mutations in load of the machines, and so on, that then expose strange bugs in your software. And most of the time, this is the theory that I want to prove, that most of the time, those are real bugs. Bugs that would show up at a customer or user site every 100,000 runs and so on, and you will never be able to reproduce. But yet, your test flakes are them happening right before your eyes, and we're just annoyed with them. We should use them. They're food, not poison. So if you, what we're doing, what we're starting to do, and I think this was just merged the other day, I'm not sure it works perfectly yet. I think it gets about 80% right. Is there trying to cluster these with machine learning algorithms? We're currently using a neural network, but I feel like we should use density-based clustering in order to say that these failures are very similar. These are one problem, and these ones over here are one problem, these flakes are one problem. So we can tell that they're flakes because someone merged in spite of the failure. Even if they ran again and it went green, we can look back in the logs, we can see that this happened, and then it was merged. So I got only one minute, but the theory is that all the big clusters are actual bugs, actually repeating bugs that happen across the tests in different ways, and the randomness in the background, those are infrastructure failures of your network, your compute, and all of that that show up in different places and so on. And we can actually, A, file issues for those big ones, especially as they get very big, and B, just retry the tests for other pull requests, retry that single test like three or four times, because we know this is almost certainly a flake. So using the bots can learn from the behavior of the humans there, and I feel like we're gonna see this more and more and more, this is the first time we're seeing it, we're gonna see this more and more and more where machine learning actually plays in a open source software workflow. And that's it. Oh, here's an example, down here, you start to see that this is a flake, flake probability, 92%, that's where we're at, in the tests we're starting to highlight these, and we're not taking action on them yet, but that's the cutting edge of where we are. All right, so, cyborg teams, happy humans, tired machines. This is really what we have to do, and more than anywhere else in open source, because we are very limited in the amount of humans involved and we make the humans do a lot of bullshit work, work that should be handed off to the machines. So, any questions? Mike's coming. What various repos do some of the definitions for your bots live in? Yeah, it's a good question. So, the repos for the bots, one of the key things is that the bots, currently the actual tasks of the bots live in the repo of the project itself. They're part of the team, everyone should have access to them. Now, some people are contesting that and saying, let's move into their own repo, good. As long as we don't lose that basic, fundamental property that it should be as easy to contribute to the bots as the software. And if we lose that, we've lost the magic. So, currently they're in the same repo, and the container that kind of invokes them does that randomization and a bunch of that stuff, that's in a repo called Cockpituous, this is all on GitHub under the project and some of the tools that they use that are consistent and reusable tools for updating those distros or deploying stuff, those are in there too. But the team members themselves, the bots are really deeply embedded in the project itself and that's the magic that we have to be careful not to lose. Another question? Oh, you got it. Sorry, yeah, sort of obvious question perhaps. Who's gonna pay my bills? Does the machines do my work? Very good. This is a topic that it's really interesting to talk about. And we should talk about for hours and we should talk about over beer. A lot of the automation and machines taking over human tasks has gone on over the last 200 years. It's neither accelerating or decelerating. And you'll see that my, actually this is told to me by my grandmother, that for her mom, when a washing machine came out, it completely changed her life. Two or three days of her life were completely now gone. The use for her existence for those two or three days used to be the job of washing the laundry. Two days in a week, can you imagine? And now it was gone. What was she going to do with that? And it was a massive life changing thing where someone lost a little bit of their purpose, quote unquote, but it was not a purpose that benefited society so much and a machine taking it over actually benefited society. We've seen this a lot. And of course, as for example, when slaves changed to the industrial revolution happened and we no longer had to use humans to do menial work, to basically build things, we saw that the change in the economy lagged. And of course for a while there was a disadvantage of this. You know, it's mismatched so to speak. And we're seeing that now too, where automation is freeing up and increasing the quality of people's lives or of our software teams and so on and allowing us to do amazing new things. But the economy needs to catch up as well. I'm pretty sure that we should not stop progress in this regard. This progress has been going on for a long time because the economy is lagging slightly. So, but let's talk about this more. There's so much in this. This is a fascinating topic and I'd love to hear people's opinions on it. So. How do you manage both credentials? Like, there's this co-pictures board that has access to GitHub. How do you manage, like, if the infrastructure went down and I want to run it by myself? Right. Do you do that? Yes. How do I get the credentials? I have to be inside the team. We have to share the credentials. What do you do? So, typically the bots have less access than the human team members and they can do the same things that a lot of human team members can do. The actual credentials for the bots that are in Kubernetes, not all of them are, but for the ones that are, are shared using Kubernetes secrets. And that's an amazing way to get secrets out to your various containers without baking them into the containers or inventing your own way. It's really on target and works very well. And if you need to actually use exactly the same credentials as the bot, then you have to ask one of the trustee admins to get access to those credentials. But typically, as a human on the team, you will have more access than the bots and therefore can perform the same tasks. It's only in the corner cases of debugging an issue or figuring something out that you don't. And that goes back to that third law. Make sure your team members all have the same access rights as the bots so they can contribute and figure this out. And it's a level playing field of team members, not overlords or underlings or whatever, but as team members where you hand off your shit work to. All right, I think that's it. See you guys. Thanks.