 So, Rana for applause for Veronica. Hi. Okay, so, thank you for being here in my talk, those who could make it. And for those who were told that they were not allowed to be standing on the sides, they sat down, so they're very clever. Thank you. Okay, so my talk is very self-explanatory and it is very simple, and it's, again, I think that everybody says this is just 30 minutes, so we cannot just go through every single thing that we do, but I'm going to talk to you about how we do it at CoroS from my perspective. And, well, yeah. So it will be more like a chat rather than this is the best practice that you should do and it is wrong if you don't do it this way. It's just like sharing our perspectives. Okay, so I'm a senior sector engineer at CoroS. Now, well, Red Hat. Okay, so who am I? That. Right now, I work on automating Red Hat Linux, our support for Tectonic. And, well, now in general, I work with distributed systems, but before I used to work in scientific computing and then a period of mobile, but all of this, through all of this time, I have been always using Linux. So this is a very important aspect for me. And then, well, Kubernetes is Linux for the cloud. So I am really, really happy to be working with this. So this is like how the talk will look like today. So the first is how we do it at CoroS. And this has two perspectives. First is strategies that we have been following, those who have worked, those that have not worked. And on the second part, it will be what I've learned, then change of part and you will see what I'm talking about with this. Then, why go is great for tooling because in the end, I mean, this is not the CoroS room. This is not the testing room. So I have to explain why go is great for this and some closing thoughts. Okay. So at CoroS, even though we do have a recently new team that takes care of testing and automation, it's not like the old school flow where people would write their code and some testers just make the QA things. No, it is not like that. Like every team takes care of their own code at their own tests. Oh, by the way, I know that a lot of people do not like or do not enjoy a lot of texts in the slides, but since this is live stream or these slides are going to be consumed by people not watching the talk, a lot of things will be redundant. So feel free to just listen to me or just read it or if you're super concurrent, do both. Okay. So every team takes care of their own tests. So this is pretty cool for accountability reasons, but also because our products are very different. So we also have very strict release automation guidelines. And well, the main goal is that if some tests don't pass, you shouldn't merge it. This is not always 100% true, but we aim for that. Okay. So some of, as I said, our products are very different. Like we have Tectonic that is more enterprise and we have EdCD that everybody loves and then we have container Linux and so on. So for example, even though I don't work directly with the EdCD team, they're the ones that have a tougher testing strategy for many reasons, of course, because EdCD is one of the most popular pieces of software right now. But also because, well, the backgrounds of their engineers, they follow like a very, very strict workflow. So I don't know if you have seen this, but you can look for this on the internet. Like in more than 100K lines of code, 60K are just for testing. And all of this includes like from unit tests to integration and migration and to end, blah, blah, blah, blah. Then we also use Bazel. Bazel is like a pain. It's the worst thing that could ever happen to me. Like, I have never... No, okay. So Bazel is a great tool when you don't have to work with it. Only when... Only, okay, to be fair, it is a great tool when you get to benefit from it. But when you have to benefit others through it, it just doesn't work. At least for us, it has been really, really painful. There are like a couple of in-house Bazel experts, but like it's not the norm at the company, so it has been really painful. A perspective for this, I mean, it is not part of the slides, but you're here to listen to me, so I will tell you why I hate it. It's because, well, all of our workflow is like 90%, I don't know, 95% based in Go. So Bazel is a great tool written for Java. So Java, C, all those are great environments. So now you start using Kubernetes, and you realize that they use Bazel for very specific reasons, and I'm not going to discuss that. But, well, Kubernetes is one of the most popular Go projects. And then you have this tool that works better with Java, or C, or Python projects. So it's just like... So a lot of times I have talked to people, I'm not going to disclose names, but like, hey, so, hey Dalton. So hey Dalton, how do I use this Go rule for Bazel? And Dalton would tell me, oh no, that rule for Bazel in Go doesn't work. So you know what I did? I used the Bazel rule for Python, and I created a wrapper, and then I did this, and five more steps, and then we have it. So it's super easy. It is never like that. Okay, so back to the experience, and how we do it at Cora. Testing and automation teams work on building targeted tools, not QA, as I said. Even though we're a team, we don't work together on the same projects. Like everyone of us works toward different efforts, for example, some work on the Prometheus team, some work on the tectonic team, or me that I work for the Red Hat Linux team that is like in the ether or something, but then, for example, I don't know if you're here because either you're trapped here because you cannot go out, or because you really wanted to hear my talk, but well, if you read the description of my talk, it said something about telling you the story of building a framework inside Cora, and well, that is no longer true because we did start building one for testing and automation, but as I said, the team is new, so we're still learning a lot and evolving, and with learning is like screwing up a lot. But that is always cool. We're trying to adapt this workflow. We try to replicate as much as we can, just like many other infrastructure companies like the Google infrastructure workflow. The real name for the testing and automation team is engineering services. On the other hand, it's my experience and what I have learned. I had never worked in a testing or automation and 100% oriented team. When I was hired, they were looking more for a distributed systems person because they wanted this value more than a testing person. Of course, I have struggled to succeed at many things, but my perspective is also a fresh look because I didn't have all the best practices or bad practices or the things that testing people used to do in other type of workflows, which is also the reason of this talk. At the beginning, we were trying to focus a lot on the test coverage, like be super stringent with our repos, and as I said at the beginning, all of our code had to be 100% and passing and stuff like that. From my perspective, that was a little bit of too much of work for the benefit that we could get, especially because the test coverage thing was only based on unit testing. As you will see, or as you might know already, this is not enough for well containers, Kubernetes, distributed systems anymore. Also, this causes incomplete end-to-end scenarios. The very definition of end-to-end testing means they have to be as comprehensive as possible, but it's not end-to-end if you only have one end and another end test. You also have to test everything in between. When developers like me or any like you don't have the empathy, the necessary empathy with the system, they are not that familiar with the system, well, we cannot create effective end-to-end test scenarios. I always mention my experience in Latin America because I'm Mexican, and I used to work in that side of the world. When you work in those environments, you're working with very, very limited resources. I'm not saying that it's better or it's worse, but you have to deal with non-technical people, non-technical bosses that won't give you money to buy a new server or a new cluster or whatever, so you have to be super, super efficient in optimization. On the other hand, in privileged or bleeding-edge cultures, like the one that I work on now, can afford over-engineering and re-writes. This is not bad at all. What I mean is that, for example, instead of following a more formal testing workflow, what we have done a lot of times is write a component, write an operator or write something, and if it doesn't mean the standards that we were looking for, either in quality or in test results, we throw it away and write another one. We can't afford to do that, also because we have very good engineers that are very good at writing code really fast, but not all teams can afford this. What I recommend is, of course, having very skilled people on your team, but also not getting rid of code just because it doesn't work so fast, rather iterate on what really works. Testing distributed systems is hard, and new considerations are necessary. I don't know how many of you are familiar with the formal verification of a distributed system, but it is really, really hard. When I say the formal verification, I'm talking about the academic aspect of it. Very, very few distributed systems in the world are formally verified, and that is not good, that is not bad. In our ideal world, every single distributed system would be verified, but it is very expensive, it is very hard, and it takes a lot of time. We have things like monitoring, we have things like testing, and many, many tools that we use today, and maybe a lot of people here use. Containers, service measures, Kubernetes, solve many problems, and they are great, like, for example, full tolerance that is not built, and built in with the Go programming language, but, well, now that we have Kubernetes, we don't have to think about that anymore. We don't have to solve, like, the full tolerant problems anymore directly, so, but that doesn't mean that if we don't have to solve the problem, we don't have to be familiar with how it works. So, with this, I mean that we need to have different levels of special skills. Of course, at Cora, we build those systems that end users have to rely on, so we have to be, like, very specific in our approaches, but it doesn't matter, like, even if you're, like, the end user for production, you still have to be familiar with how the tools that you use work. Okay, so then there are two perspectives with testing and containers. The first one is how, using containers for testing, and the second, how to test containers. Okay, so for using containers for testing, I will just go through this super quick. So you package your test suite and make your system run against it. The benefits are that it is practical, neat, fast, portable, scalable, et cetera. Everything that you use is the benefit of a Docker image or any contained image. Then it sets a standard for distributed components. For example, at Cora, we're pushing towards that strategy right now because it is very easy, like, since we work in very different teams towards one goal, like, let's say, tectonic. Tectonic includes the primitive support, entity support, all of that. So where the goal towards testing, this testing strategy, is like to package all the tests in a container and just run our systems against it. So that is super effective and super easy. Of course, easier said than done. So we're still working on that. And then the second perspective is testing containers, right? So testing distributed systems is hard. Oh, here it is once again. Okay. So if you go to the Kubernetes documentation, you can find this, specifically in the end-to-end test suite. It is not uncommon that a minor change may pass all unit and integration tests but cause unforeseen changes at system level. So as we said, end-to-end, by definition, should be as comprehensive as possible. So what I'm trying to say with this is that usually people who work with distributed systems have plenty of experience and have therefore different backgrounds. So a lot of our best practices for testing that we used to have may not be enough anymore. It's not that they may not be enough anymore. They're not enough, okay? Because just as this quote says, at system level, many, many interactions can happen. And the components of a distributed system might work on their own, but not when they interact. Or they might work when they interact, but they can interact in many different ways. So they can work at, let's say, seven different interactions, but at the eighth, it's broken. And when it's broken, it doesn't matter that all the other scenarios work. So the fact that we don't have to worry about fixing things anymore doesn't mean that we don't have to know how they work, and especially how they fail. Because if we know how they fail, we know how to fix or how to implement solutions from the very beginning. For example, I was talking about full tolerance and the Go program language versus the Elixir program language. And the latter one has full tolerance included. So I was like, okay, so why doesn't Go have full tolerance? And that is not, like, of my business right now. But my friend, Yana, JVD, told me, okay, so in Go what you do is instead of, like, crying or whatever your favorite way to deal with full tolerance is, what you should do is design for failure in mind. So not only catch exceptions like you do in Java or any other programming language, but you design with failure in mind. That sounds horrible, but that is how you have to do it. And the only way to effectively design with failure in mind is being aware and being very familiar with your system and with your tools. Like, for example, being familiar at any degree that you need it. Of course, you don't need to be, like, an expert in every single level if you don't have to. But if you work with Kubernetes in a production level, well, know how that works. If you build the operative system behind that, well, you have to know how that works. Okay, so unit testing is always important. But with distributed systems, we have two outcomes. Either it is incomplete or it's too complicated because you have to see every single input that has to happen in order to make it complete, to make it fail or, well, pass or not pass. Massive integration tests are an anti-pattern. And when you use only unit testing, sorry, to assess the health of your code, well, you have, well, basically what I have seen is that people rely a lot on mocks, you know? But imagine the larger your distributed system is and the more the interactions it has, the mock you have is, like, massive or it's not sustainable anymore. It's just another system on its own. So what do you do? Well, of course, they're testing on automation tools and frameworks, et cetera. But also we have the conception, and that is true, that only three nodes are necessary to test a whole distributed system. It doesn't matter how large or small it gets after that. But what matters is not how many nodes, but the number of inputs that we put. So then the mock gets massive. So monitoring and support teams must not act as your system's nanny because a lot of people, what we do is like when we build our distributed system we, and we don't test it correctly and we just send it to production that way, well, we just rely super heavily on either your monitoring strategy or your support team. And that is not the goal. So right in this talk, I found scary similarities with many people that works on similar projects. For example, Cindy, and you can find her in Twitter as copyconstruct likes to write about it. Well, I don't like to write about it, but she writes very cool articles and medium posts about it. Her strategy is more to talk about the testing but from the monitoring perspective. She has this very deep knowledge about the difference between observability and monitoring for me is the same. But I'm not saying that it is bad or good or bad. It's just that I don't have all that specification knowledge. And well, on the other hand, since I have an academic background I always, every single problem that we have, I try to tackle it from my experience where I'm very good at. So I was reading this paper, which is really, really good. Just look it like that on the Internet. And they found that 92% of the catastrophic system failures are the result of incorrect handling of non-fatal errors. So this is not another thing than error handling, like the try and catch or whatever it looks like in your favorite programming language. And this brings me to the Go tooling perspective just to finish. So when you go to, well, if you haven't already, to read about the origins of Go cover, that is the coverage tool for Go, you can see that Go was created with tooling in mind. Not that you need to read that article to know that, but in case that you're still exploring Go for your first projects, that is a very good introduction for Go tooling. Also, the error handling and the famous articles of errors as values, written by Rob Pike, I think, are a very, very good introduction into why Go is really, really good for tooling. Okay, and all of this because in the experiences that, well, to sum up everything that I have said already is like, one, distributed systems need much more than unit testing in these days. Two, we have to be familiar with what we're building, even if we don't have to fight against it anymore. If we don't have to fight against fault tolerance because Kubernetes now takes care of that, it doesn't mean that we don't have to know how it works, because if sometimes it doesn't work as the vendors promised, we have to know how to fix it or not even that layer, but the next layer. And, well, three, Go is a very nice language for tooling to create your own tools. Me as a testing and automation engineer at Coro is working on the bleeding edge of the containers and that sort of technology with Go. We have found that the tools that you create only with the standard library are a very good option. For all of the reasons that you can find at any Go talk, like the simplicity, the syntax, et cetera. I also recommend Kelsey's, Kelsey Hightower's Kubernetes the hard way. Not as an introduction or anything like that, but as a means to really understand what you're doing. Even if you don't get to do your own Kubernetes distribution or on distribution, but just to know how it works and how to fix things. Okay, so I try to be really, really explicit, but, well, time's up. Okay, so thank you. And we're hiring. It is small because if you heard the news, we are being now part of Red Hat. And hi, Derek. And I don't know how the hiring will work right now, but, well, we are always told that for our talks we should put the we're hiring thing. So that. And one thing, everybody at Coro wants to know, wants to let you know that container Linux will still be alive. Thank you.