 Hello everyone. I'm going to be talking today about delivery and debugging of MariaDB Docker images. I hope you've come to the right place. If not, you can run away. That's okay. So what I'm going to be talking about today is the kind of container bugs that we didn't expect as we're doing container maintenance. A bit about the container runtime restrictions that apply. About when you're actually using containers, there's a few opportunities that can be taken advantage of. And as we're looking at containers and the bugs we provide and the features that we get, we see that there's opportunities to improve the MariaDB code base. So I'm hoping and throughout all of this, there's an element of continuous integration and continuous delivery that aids through the process. And throughout the general theme of this talk, I'm going to be a big guest talking about the ecosystem cooperation that can result and is kind of needed as things evolve a lot quickly. And generally the ecosystems out there are ready to help you if you only ask. If we look at what we see in a classical VM, is classical the right term? VM or BEM metal environment, we see that we've got an application that you develop. We've got a set of libraries. We've got a kernel and got storage all in this sort of one bundle coming out together. It's all tested together and tested by other people. So there's a lot of inbuilt assumptions about the integration between all the components that is there. When we think about what is a container, we can say, well, okay, that the kernel is given to us. The storage is given to us. So we only need to worry about the libraries and our application, right? That's a nice theory anyway. What ends up being the case is that effectively our users or the environments that would be given tend to throw a rather arbitrary and mixed selection of kernels and storage at us. And as a MariaDB database, we're just expected to magically work with all of them now. And some of that comes through luck. Some of it comes through a bit of engineering. But I guess what I'm going to go through is where it's gone wrong. So one instance here is an output from a bug report that we get occasionally on a Kubernetes. They say they've got an NFS storage coming up. If we see the logs, we see there's like a 30 seconds output here. And so it's taking 30 seconds to write a 12 megabyte file. And that's all that's happening, which doesn't particularly go well for a database server. You expect a little higher throughput. I honestly still don't know the story behind this because when we get these kind of bugs, it's like, how do I replicate the entire user's environment to work out what actually went on here? And I welcome any suggestions now or later as to how to resolve these kind of bugs or even understand them. There's a workaround. You can put this temporary file elsewhere, but it's a little extra configuration to do so. Another file system thing that got thrown up at CIFS in MariaDB. Apologies for the system calls. We do a function control to set the storage on a file descriptor to run odirect. It succeeds. It goes, yes, I can do that. And then you write to it and it fails. It goes invalid. And these are the kind of things that pop up because it's easy in Kubernetes or environments to throw any kind of storage at you. And while databases may have historically run on this wonderful, ancient technology called local storage, that is becoming less common now. As a console, we need to handle this kind of things. There's a workaround in CIFS if you mount it, no cache. It's like the mount option. It'll allow odirect going through, but it takes a bit of investigation to get through these kind of issues. There's this other thing in the ecosystem called Windows subsystem for Linux, which I think means Windows looking like, but not really like Linux. And what we see is that is like a Microsoft product that emulates what Linux system calls into what native calls are underneath. And let's say it's not always a one-to-one mapping. So this particular Windows for subsystem bug came across at a while. If you open a file descriptor, rename the file and do an fstat on that file descriptor. It reports eno int. It's like, how can it not exist anymore? I've got a file descriptor open. Now this may look like a particularly odd case. However, this is what is executed in the database engine whenever you do like an alter table and it needs to rewrite the table structure. So it's not a totally abnormal case to actually do this. However, it's one that Windows, Azure hasn't solved. I've provided a bit more information at the beginning of this week. So I'm hopeful. I'm hopeful they will one day do it. And this is, I guess, one of the examples that, you know, cooperation with the community, whoever it is, is needed to try to resolve these issues. Either that or you've got to reinvent your code base, creative things. This is another error that I got. This was when I changed the code, sorry, the base image of MariaDB to a Docker Jammy, sorry, an Ubuntu Jammy image that I tested locally, works fine. You know, immediately after a release into a stable version, you know, users go, what? This is error. I don't know what it is. And yeah, they're involved a bit of discussion because it's like me, I can't reproduce this. I have no idea what you're talking about. And, you know, eventually there's some discussion about when it does occur, when it doesn't occur, OSX is fine. This is fine. Someone worked it out before and said, I mean, this frequently asked questions means that there's an answer there. So what the problem ended up being is SecComp filters. So SecComp filters are a restriction on the APIs that a kernel, sorry, that a container can actually perform on the kernel. There was this clone three system course, the things you don't see when you're doing application program, but that was initially blocked because it's an allow list. So what happened in a Docker version before 20.10.10, you know, it didn't recognize this call and allow it. This was actually in the Ubuntu, Focal and Jamia at the time. By the time Jamia was released, there was an update. However, some people, as you know, don't always update the latest container versions. It's amazing that they jump ahead at a MariaDB version, but not a Docker container, a Docker runtime version. But as a consequence, this is kind of the errors that you get. As I was looking through this particular issue that came up in preparation for this talk, I looked back at like some of the projects that have come across this and found some people are still running second confined on MariaDB or pinning MariaDB to a version before I dump the base and just want to say, please don't do that. You know, if you get a disabled security feature, might hurt to, you know, look again every once in a while to see if it's actually fixed and relevant, especially on such a transitory issue. Also MariaDB 10.7 was a short-term release. So it's not even supported anymore. And it probably isn't as relevant to your use as all the CI environments. So I'll interlude with some bad news that like failure is kind of certain. It's an evolving market that, you know, things will go wrong at very important. But it's also very important to realize you can trust your community that they're not out to get you or lynch you on every mistake that goes wrong, that they don't help. And if you're responsive, if you're communicative, they'll be able to help you. Also CI tests will help because ultimately it's the user's responsibility to see if a container works in their environment. There's only so much as like a distributor or producer of an image that you can be responsible for. So if you're a user, run CI tests. If something goes wrong, it may be us, but you know, let us know. The other thing I want to actually cover as a brief interlude is the kind of bugs you get is, you may assume as a MariaDB distributor that I get bugs about MariaDB, but once you throw it in a container image, all of a sudden you get blamed for everything that's in the container as well. Maybe that's fair, maybe it's not. So a first instance up there is like libfka, which is something I don't even remember, was underneath GPG as a dependency. It wasn't used by MariaDB directly. It was used when the package was built to verify their signature, but that doesn't stop users from saying there's a vulnerability in your image, hurry up and fix it. And the same with OpenSSL was there. What's good about being part of, say, the Docker official images program is that Ubuntu in the Docker official images gets a regular update about every month. Sometimes they push one a little bit early, and once they rebuild their Docker image there, they automatically rebuild their re-dependent container there. So when these came out, I was pretty confident that it was already in the pipeline of being fixed anyway, so despite the low vulnerabilities. The GoSU vulnerabilities, and we've come across this a couple of times that people will say it's got a vulnerability. I'll go back a step. GoSU is basically a cut down version of sedu. It changes user and executes something. It's just one of the artifacts of running as a root container and needing to drop down privileges. What's happening with like vulnerability scanners is they go, oh, this is a Go application. That must mean it's sort of vulnerable to every CVE and security vulnerability that could possibly exist in Go. And so we've got a status page in this from a few days ago that lists like we've got two critical bugs, 28 high bugs and 12 medium bugs. And as you can see the top-term contributor up there is a standard lib, which is from Go. So if that wasn't there, it would actually look quite good. And of all those vulnerabilities are there, they're all pretty much a false alarm. So we've got actually work with our Docker Scout that sort of produced this and say, okay, can we get to a stage where we're actually reporting real vulnerabilities only and not other ones. It is possible. Go, for any of those who program it in, have there's a GoVolve checker in there that will test the scope of the standard library usage within an application. Let's see if it's vulnerable. But even not. So once that's out of the way, there's only three low and a couple of mediums and incidentally one of those mediums, a bunch who's got one of their packages marked as vulnerable when it's actually not in the release notes. So there's a bit of communication that needs to happen to in the security landscape to ensure the right information is getting the users. Get users that sort of say our company policy is if there's a vulnerabilities check in our program, we automatically eject that image from the life cycle. Okay, it may happen, but that's kind of your problem. But what we can do is to try to actually get some decent information up there. And I'm not the only one that's sort of come across this. This is, I guess, SonaCube also came across this kind of issue. They're doing vulnerability assessments on things that are picked up in the scanning. They're not always true. And, you know, at the time, I think various people in the security and the container distribute a landscape of pushing out information and trying to work out how to actually resolve this in a meaningful way. I mean, yeah, that there's things that can show up in packages that are vulnerable in a particular set of circumstances. But if they're not exploitable, well, they're not exploitable. And we need a way to communicate that without responding in every issue. Or I think as Kate was saying in a previous talk that, you know, doing blog posts and that kind of thing, it isn't that scalable. Another thing, SecComp applied some restrictions currently Eurings blocked again. It wasn't blocked at one stage. And it all very much depends on the runtime. And, you know, I looked in the back history as to why it happens and sure, Google's paid out a million dollars in security vulnerability payouts for exploits into earring. That equates to, I think, about 15 vulnerabilities this year. I can see why they've restricted it off. It may get unrestricted again. But from a point of developing software, you've got to realize that just because your kernel supports it, you know, doesn't mean it's going to be available. Other restricts that come into play. The dev, I always get the order mixed up, sysdev block provides a bunch of information about the host environment. Now, quite rightly, you can assume that, well, maybe a container shouldn't know much about the host environment. However, if you're actually providing storage to it, you need to know a little bit about it. Because to write to a oDirect file, you need to know the physical block size. Otherwise, you get errors. So in discussion with, like, Podman and try to actually improve this information was like, maybe there's a way that we can provide some information, particularly if that storage is exposed into the container. So I need to finish this off. Oops, too much time at conferences. Who's heard of AppTainer? No one? Oh, okay. Singularity as a runtime. It's a HPC project that renamed to AppTainer under the Linux Foundation. It's also got the ability of running OCI containers, which MariaDB obviously is. However, it applies a much more strict approach that the container insides aren't actually writable. And what that means is some things that may have worked in like a Docker environment or a Podman environment don't immediately work in an AppTainer. And this for MariaG included a Unix socket that we provide for connection locally, which is that still relevant inside a container. It's kind of used in its startup. There's also a PID file there. When was the last time we needed them? I can't remember. It was a while ago. Maybe we should remove that out of our code base. So at least that's one thing that doesn't need to be there anymore. Another thing containers and the run times assume is like a start time and stop time. And for Docker, this is like 10 seconds because why would a container take longer than 10 seconds to start or stop? When this limit is exceeded, what happens is the container run time just kills off the process. And we get bug reports that says, you know, why is every time I start MariaDB, it's a crash recovery. It's like, yeah, this is why. So I mean MariaDB is an asset database. So sure, it can tolerate being killed at any particular time. But, you know, it doesn't mean it's the best approach to do all the time. So by increasing the time out to say one and a half minutes as the user did, that sort of was sure, one and a half minutes in their shutdown that was longer. However, it saves in like two minutes in crash recovery. So you can tell where our optimus paths are. What Red Hat has been doing is a bunch of integration between container run times and system D and the provider ways to run containers under system D. And what's sort of interesting about this is that there's a notify setting in the system D service. That means the container can start to communicate these ready kind of signals that are relevant in say the system D context to a healthy setting in a container context. Both of these run time environments have a lot of similarities. So it makes only sense that the functionality that's there on the system D and sort of has been for the last when the system D come around 2015. Someone correct me if I'm wrong about that time. It sort of became a bit more mainstream. So we've got a function functionality that isn't system D that could actually apply to containers and simplify our process that way. On the MariaDB side, we've come across some behaviors that have been there for ages, i.e. ignoring global writable configuration files. It made sense in one state. It was sort of possible to write. If you had access to a system with a MariaDB config that was globally writable, you could just write user equals root, some variant of select into out file ETC password and get a privilege escalation. Hasn't been for a while since MariaDB has run under a system D service, but the code was still there. It's a problem for containers because when you run them under a Windows file system and have that kind of mounted into a container, should the Linux subsystem for Linux will handle the rest mostly. But there's an aspect of, well, okay, you can't map a Windows file system permission across. So it just goes, it looks like global writable. And that's a problem. I guess if you start a container up with a specific config and that gets ignored because it's globally writable. Yeah, funny inquiries of things. So I guess what's possible in volumes is for the volume to be actually marked as read only. And once that happens, all we need to do is change the server code to say, well, if the file system is read only, I guess it's not exploitable. And that's what I, and that's been there in the last release. It doesn't mean that's the end of it. There's possibly still avenues of going a little bit further and saying, well, okay, if it's globally writable file, maybe can enable something like unsafe mode, which restricts setting of things like user and that kind of thing. So just to make it a little more user friendly that, you know, if a user forgets a read only on the config file, they're not being ahead for ages to work out why it's not being applied. This was a few years ago, but a time zone initialization was slow, was the bug report. Marie Dippie to initialize all this time zone data reads a bunch of user, user share zone info, generates a bunch of SQL statements of insert and dump some into tables to do that. Originally with Marie Dippie 10.3, which is incidentally now end of life, that was reasonably fast under my ISAM. When Monty reinvented it as ARIA and added a bunch of crash safety, that mean, but slow down like the insert operations to make them crash safe. And as such, it was fine on Monty's machine that sort of did it. It was fine on the CI environments. But users out there kind of sometimes use a little bit slower storage than what developers and CI machines do. So this disparity between what a user has, its storage and what's tested, well, fell out. And, you know, the users said, well, it just doesn't start or it was taking minutes or tens of minutes to start. Turn down, it was reasonably easy to fix. You do a lock tables under it. And once that was applied to the code base, it was all fixed that way. In the way I've talked about the last few examples that there is the things you come across in the container world that end up being applied back to the Marie Dippie code base, that kind of improvements. There's another feature out there currently that sort of increase improved database migration. So if you've got a database, you want it to migrate the data to the new structure and then be available to the application. So you've got this seemingly contradictory requirement. I want the container not to be accessible to the application, but I want it to be accessible to the migration scripts that do all the alter tables and that kind of thing. And those kind of requirements need to make us look back and reflect and rework the way the code and add new features into the way it did. So the way we think you're doing this is to start it on a different port, provide some container volume of the migration scripts that they want to run. And when those migration scripts are finished, it'll go back and start listing on a public port where your application submits. And it's that kind of requirement that means Marie Dippie can come more container friendly for the user base. But there's also flow on effects that that functionality can be used elsewhere and say the Debian packaging to auto run its upgrades and its tech scripts while the user isn't trying to run an application on it. Okay, that's enough of bugs. I want to actually talk about some opportunities. So Marie Dippie has a billbot of our CI infrastructure that's sort of public, the container image of the Marie Dippie Docker official images public. So since we're actually doing mostly work in public anyway, what decided to do a year, year and a half lost track, but there's a container images actually pushed up into CI that are consumable by anyone. So this is our running of the CI on the container. So every commit into the server produces an image and that is actually tested against the test cases of the container image as part of our CI process. And out of that pops a, this is URL from Quai.io and it gets images. So what I've done, I've sort of tagged them on the branch name. So I've just gone the very 10.4, 10.3 and 10.6 when the developers don't trip over things. I've finished the branches. That's what happened when I walk around. Now it may take a while to get up. So I'll just, you know, wing it at end and actually be able to see you all anyway. So before that blanked out, what might have been the case is there was 10.4 is the earliest long-term support release. So we actually tagged that with those particular tags. 10.3 was the latest version and that had its own tags. So we've done that. So it's easy for people running CI systems against MariaDB to say, okay, set up once. We want to test from your earliest version to your latest version and maybe a long-term support version. Obviously being more careful there. And those are alias to the same thing as we see here. Like the earliest, the earliest long-term support is this 10.4 release and the very latest is 10.3. And that corresponds to a release candidate. It's not the alpha code. I think I'll stop moving my feet now. It seems dangerous. So if you were to consume MariaDB and MySQL images in your CI, what you'd actually do is you could take the container names and then you've just got MariaDB foundation dash MariaDB develop slash earliest, slash latest. And now you're actually testing against what is going to be effectively in the next release of MariaDB. This will save projects that may have like a regression as like MariaDB does a release. We tweak something that we shouldn't have. And now all of a sudden you know, Project X needs to interact with that. If projects are actually consuming this ahead of time, we'll see things that are starting to pop up as problems before our release and save other projects or release of their own. The other way to use containers is that when users produce bugs, sorry, when MariaDB produces bugs and users actually notice them, that develops and sometimes goes through a process like, have I really fixed it for the user? Now, because we've got a CI process that produces packages and containers, a developer can push to a branch name that has a package test in it and automatically at the end of that, that a user will have a container that they can test with to see if it's actually fixed their problem. And that's particularly relevant for some of the harder to prove or communicate problems that exist. So CI is a magical thing that enables people to test them and have something that's consumable by the user. So in conclusion, if you're doing your own development, the things to take note of in preparation or in consideration of doing container work is that you should rely on, say, runtime checks of particular storage functionality. And if they're not available, fall back to simple mechanisms. So if IO yearning isn't available, we'll fall back to a synchronous file IO mechanism. And that's what we were doing in MariaDB anyway, but it's a good thing to practice as far as development is to have redundant code bars, depending on the functionality available. Take opportunities as you get bug reports and use a feedback to re-examine the things you're doing in your own code base because there's better ways of doing some things, the things that may have been relevant 10 to 20 to 30 years ago maybe aren't relevant now and there's possibly but ways of doing it better and still being compatible. By engaging your user base, you can actually test the concept changes to not only the containers. I didn't quite mention that the Quio images, they included latest changes in the container as well as the kernel. So we're providing those so users can test them before they become an official image and welcoming bug reports ahead of time. As I was saying, take part in the communities around you. The container runtimes actually care about what you have as a requirement, so communicate those if you need storage information to be accessible. Tell them why and they'll actually, in many cases, bend over to help you out or at least give you the opportunity to write your own pull request and change their code with review. There's an overlap between I guess system D and containers as both service managers and the features they offer. Say for instance that stop time in system D since version about 234 or something, there's a extend timeout so containers can actually, oh sorry, so a system D service can tell its container manager, yes I'm still working, I'm getting closer to a solution. Now once that sort of integration becomes possible with containers, containers can start to say well you know I'm still working, I'm still trying to shut down cleanly, give us a little bit more time and that means that that sort of thing can be done cooperatively without introducing other operators or functionality to say you know watch this because ultimately it's a functionality within RORDV itself. The last bit on communities is taking part in discussions with security vulnerability assessment and how they do things, the processes that you want, yes we want to be able to say this may be a vulnerable packy thing in our image but it can be mitigated in this way or this thing isn't vulnerable at all or those kind of things so I'm probably dreaming a little bit for that to be happened but those are the kind of discussions that sort of need to take place for users to get the right experience and an accurate experience of what's really happening. Utilize your CI systems to help your users validate your bug fixes, provide pre-release testing so they can test it in their own environments because after all the early workload that really matters to a user is theirs so they've got all the ability to test it and if you can alleviate that burden of compile, build, repackage and just you know streamline that delivery to them that means they can give you good feedback ahead of your release. On you know the good kind of bugs I occasionally get one like this, the performance is too fast in Docker, I'm really sorry. So that's a good note I want to end on that some bug reports just bring a joyful you know smile to your face every once in a while. So that's all I had now, I'm Daniel at MariaDB and that's the end of my talk if there's any questions.