 So, hello everybody, thanks to join so early to listen to me. That's, yeah, surprise. First of all, who am I? My name is Luca Pizzamiglia. I'm an Italian guy working at Trivago. I'm a FrubisD enthusiast. I use FrubisD at least for 10 years now and I received my port commit bit in August 2017. What I'm doing usually is trying to look for new use cases for FrubisD. I really believe FrubisD is a very solid platform and can run basically everything. Specifically in Trivago, I built packages, customized stuff and so on. It's early, Sunday. I had the ability to provide gadgets, yeah, socks. So every time you have a question, you receive a sock. So the first time questions, so don't be afraid. Please speak out. Make this presentation more interactive. Help me. It's really for me also. So this is for today. We'll talk a bit about JSON port, what the port is, something I presented two years ago here. What is a service mesh or what I believe the service mesh is? What is the port image, future work and then your question. The port framework is something I started more than two years ago, actually. The ambitious goal was to basically was, yeah, there is no docker, we know, on FrubisD, at least not natively, but actually is a really nice pattern or at least something that is quite easy to use. But in FrubisD there was actually no container model in general. So why don't you try to do something similar using all the technologies that are already available? Like ZFS, JL, PF, Finet, Sercitiel, CPU set and so on. So I came up with this framework as a set of, basically it's a tool that allows you to manage JL's dataset, I would say pretty easily now at that time was quite rough. But yeah, that was the logo at that time. So a port comes from a port, it's not the other meaning of port. This is someone else suggested. And that was basically the state at that time and how that fit in here. Who does know what a service mesh is or use, wow. So you have to, I have to explain the best of my capabilities, what a service mesh is. A service mesh in general is an abstraction where a developer here can submit basically a job or that would see what a job is to this very big black box, cloudy thing. And then the user can just use the service that is exposed by this job. I will go a little bit in detail. This is not like, I mean it's similar to Kubernetes but it's not Kubernetes. So basically the developer has a job description where it specifies which container to use and which port should be exposed, which type of services is there and so on. Here we have a central point in this case, the orchestrator. It received this job description and they say okay, you want these containers up and running. It looks for the, in the worker cluster, so we'll basically transfer the allocate or orchestrate basically the execution of these containers to the workers. The workers will download the container images there. We'll be spawned basically those new containers. The servers will be up and running and that is one side of the story. Then if you use a network service, you have a service discovery, something, a server basically. We use console here. And what the orchestrator does, we register your service here and say okay, now I decided that your server is running on this machine, exposing the port, I mean it's running on ports one, two, three, four, five, whatever. So these information are registered now here. So the service discovery will know okay, there is this service with this name that is running there. And what it's also doing is continuously checking if the service is up and running. So the health status of this service. And this is the so-called control plane. So now theoretically if you want to reach the service, I mean the orchestrator decides where the service is running so you don't know exactly where it is. So you have to basically interrogate the server discovery. And this is called, I guess, the control path. So basically you can know basically where your services are running. And you have another element that's called, is a proxy or a layer seven load balancer. This load balancer or this proxy, you can configure it basically to say okay, when there is a user that wants something, look for where the service is running. So basically the list of the backend is continually updated to know where all the services are running. So it's like oh, you want this service, I will route basically all the traffic to the proper nodes, but they get the information here. And every time there is a new service that is, the same service is replaced. So the old container got destroyed and a new container is running for instance here. The information is updated in the server discovery. And so the proxy can get updated information and route the traffic in a different direction. I hope I explained myself. I'm not really a super master of this all elements. Any question? Cool. It does not seem a freebie upset to me because images, nomads, and there are products names. So to repeat the question, this doesn't seem a freebies these things to me. Yeah, of course, that was the problem. Actually, those are programs that runs everywhere. They're mainly, I guess all of those are go programs. Yeah, that's actually the point. How can you have something like this on freebies? And that is the solution that I'm presenting now. As you see, the real issue is this container images. There is normally no such solution that has used these names. And exactly that was what is really needed. The features that was really missing was the ability to deal with images. So creating and all of the images. So you need to create and export an image. You need our image registry. I mean, you have to create a Docker container. Docker Hub and then the ability to download and then import an image. That was the set of features that we worked on. But more importantly, what is really different is the paradigm. How you work with this kind of, I mean, the workflow is really different. We are used to create jays directly on the machine and then you'll configure them and you let them running. And the developer will just give you the PHP code or the Java code, whatever is running there. But you are directly managing jays on your final machine. So the paradigm here is different. Someone is working on the image creation. So in the EJ image. And someone else is deploying it. So the paradigm is very, very different. And also how the set of tool that you need is different. It's an almost unexplored area, I'd say. And I will discuss before the deployment of the jail or the available notes to make that visible. And then we'll discuss about how to create an image and all those things. There is already a lot of complexity all around. I mean, a lot of new names, new components, things that are going there and there. So the first implementation of POT as basically a jail was composed by multiple data sets. There was the base. There was Monterey-Ridoli and shared with other jays and so on. For this kind of scenario, when you have those moving parts, it was too complicated. So I decided, OK, just go to one single big ZFS data set where I put everything there. So it's easier then to create images how I export a POT. Basically, you create a POT, basically a jail. You customize it. You put the software that you want. You take the snapshots. And then with ZFS send and compressed tool, boom, you create an image. So basically what you want inside your jail. And then you have a file. A file is easy to move. To import a POT, well, is that the way around? The question is, only ZFS is used? Yes, ZFS is mandatory. I don't support other thing. If you want to use a different, please support a patch. I'm happy to extend support. The whole POT framework is heavily based on ZFS because I have all the nice feature, snapshots, a lot back and everything's there. To import a POT, when you download a file, then you can just uncompress ZFS receive. Boom, you have the same basically content somewhere else. And then you can clone the snapshots. I use the clone here because if you have one image, then this image is reused. Imagine an Nginx server and you have the same Nginx with different configuration. It's easier than to clone it and just change the configuration file than to untar again and do as ZFS receives. So that basically is a slight short optimization. Question so far? Other question? Question was if I tried to use Behive. No, because it means completely the point. I mean, Jail is very, very light. Behive is not that light. I mean, Behive makes sense when you have different operating system that you want to run on privacy host. If you have the same operating system, use Jail. I mean, I want to focus on native support. I mean, it has to be native. It's really faster. If you have Behive, you don't have the same performance because full virtualization with Jail, you have a light virtualization. So it's just the operating system side. I'm waiting for your patch on that topic. So why we choose Nomad to try to implement this service mesh dinosaur? Nomad and console are, say, privacy friendly in the sense that they are already in the package system. So you can just install Nomad and console and have them up and running. So console is the service discovery and Nomad is the orchestrator, basically. Moreover, Nomad has an internal structure that allows to add additional driver. And a driver is something that is basically supporting a different type of containers. So there is a driver for Docker container, a driver for, they call it this Java thing, means that basically you have a jar, a target, and then it's executing the JVM right away. So different types of containers. And what happened here is why can we not write an additional driver to let Nomad interact with POT. So Steven, let me see here. A colleague of mine just developed the driver. I mean, tried to create this bridge between Nomad and the POT framework to interact with Jail. So the driver is already available. You can install it on your own. So it's a mature product but stable enough. And this is the job description. I will go through that a little bit. This is a very easy example, even if it's very long. You have on top the name of the job, the job, as you see, it's a service. Then you have the concept of group a service can be composed by different containers. So you want to join them to provide a service. In this case, the group is called example group, the fantasy, and it's composed by only one task. It's only an engine expo. And here you specify the driver. Saying the driver is POT, then Nomad will look for a worker that supports POT and will be able to execute a Jail in that way. I will just jump here. The part that was strictly part of the driver, so basically you say the configuration, where to download your image. You say the name of the image, the version. The command is basically optional, but it's the entry point of your Jail. What is the first command that's to be executed when the Jail is spawned. Then we have here, that's the service part. So that is all the information that you give to Nomad to spawn the Jail. And this part, the service, is what to register to console. The name of the server will be WebExample, for instance. And then you see this port HTTP, port HTTP, port HTTP here. This means that the port 80 in the container will be mapped with a different port outside. There is an automatic PF redirection rule that is injected to the node. And this new information will be registered in the service. So we will see in console there will be a different port. Locally, the Jail, we have an NGNX running on port 80, but then you have a PF rule that makes the redirection. And in console you will discover basically where this thing is running. Questions so far? Yeah, the port registry, there is a slide later. I mean, the question was about what is this port registry, is a web server with files. There is one slide, exactly on that topic. It's complicated. It's all the second part of the, was about that. It seems easy, but it's not easy. I mean, if you think why I'm not using IOCage or other Jail frameworks that are out there. First of all, because I can do my own. It's open source, so you can do whatever you want. Now, this service, it doesn't work. There are features, I mean, a container has something different from a normal Jail. For instance, this command in NGNX is not forking in background. What it means, means that if you use that type of command with Jail, with the Jail command line, the Jail command line wants to return because the first command is basically executed by the Jail command and is just keeping your, still your shell. It's not returning. Jail has this standard assumption that the command, the exact start of the command that you gave, basically at a certain point will fork and will return. In this case, it's not the case. I mean, usually containers are working with blocking commands. So basically, they keep the NGNX is pointing and they're serving directly with traffic. Period. It's not forking. And that creates a series of issues. In this case, for instance, the post start hooks are not executed at all because what the Jail command is doing is just working like create the container, spawn the commands, whether it's over and then executing everything that is after the post start. The post start is never reached because the command is keeping, basically it's not returning. And a nice thing that is nice, took me some time to discover is usually those containers are ephemeral, means that if the process inside the container disappears, the container disappears itself. Jays are by default persistent. So if there is no processes there, the Jail will stay. And IOCH, for instance, only supports persistent Jays. The no persistent parameter is also applied by Jail start as a post start hook. And it didn't fork, basically. I have to find some weird way to be able to have this no persistent flag applied even when a command doesn't return. Last but not least, Jays not clean up themselves. In other words, the post stop hooks or the pre-stop hooks and so on, if you have a not persistent Jail, you mount your devices, you create your PF roles, then the Jail disappears. Come on, you have to clean up. But you don't have any notification that the Jail disappeared or you don't have any way to register a callback. What you do usually when you call Jail minus R, that is the Jail stop, is the command that is executing those hooks. So I say, okay, the command says, I have to stop this Jail. So I run the hooks, I stop the Jail, and then I run the other hooks. But if it's not persistent, the Jail will just vanish. And there is no centralized demon, basically, that is controlling what's happening with the Jails. They are just, it's an internal operating system structure that just, it's like when a process is going away, you don't get any notification per se. We solve this basically inside Nomad. If Nomad has to control that the containers are up and running, if it's not running, it's just calling a post stop and it clean up basically almost everything that way. But we don't have generically a way to clean up not persistent Jail. Questions so far? It's clear and too fast. Can you repeat the question? You have to avoid, the question was if you can run something that doesn't demonize, you have to change your command line to avoid that it's demonizing. Oh no, so yes, you can. So the best practice in this world is to have one process per container. So for instance, we have a radius instance and the exporter that is just basically another program is running in a different container. Because if one container dies, you just replace this and you don't replace the whole service basically. That is a common best practice in this way. Taking care you can, it's suggested not to do it. But yeah, you can put more. You can create your own script and spawn different stuff. The point is that the last one shouldn't return. Network. Currently we support two network configurations. One is called host. Meaning in Jail jargon is inherit. So basically you're using the machine network stack. The other one is the public bridge. Meaning that you have an internal bridge. Every Jail has a vNet. It's on stack. It receives one IP and is exported outside. You have an app that is showing everything and you have redirection rule for every service that you want to expose. And those are supported both in port and in nomads. In port, we have also other two network type configuration. One is the private bridge. It basically works exactly like the public bridge. So basically you have one bridge and you attach your Jays there. But the public bridge is one for all of them. With private bridge you can have a bridge for a subset of Jays. So I have three Jays. I want them isolated. So you can have dedicated bridges just for this type of, yeah, for your work. We want to add this to Nomad as well. So it makes sense. For example, for a group you have three different Jays. You don't want them to be on the same unique bridge for many reasons. Currently we have a small issue because the driver was at the task level. But the bridge works at the group level. So it's one obstruction level above. And apparently on Nomad 010 there is this ability to have something at that level so we have to figure it out. Currently it's already working with the public bridge but if you put 200 Jays then this bridge will be overloaded and probably the performance degraded. So this is an area where we want to improve. The last type is alias. This is the typical jail way on doing things. So basically you assign a static IP to your jail and that will be an alias on your network card. In this dynamic cloud environment it doesn't really fit well because you can theoretically have this jail that is moving from one node to the other and this IP will follow. But it means that you can only have one instance of that container. You cannot have multiple. I mean this cloud thing is designed to provide you horizontal scalability so you need more horsepower than you have instead of three web servers, five web servers, ten web servers. And in this case we are limiting to one because there is only one IP that you can assign. And also when you want to respond, I don't know, you want to respond the same, there is a new version of the web server so you have to redeploy your job where you have to remove the former job redeploy someone else and then you have a small downtime between those two times. So I don't know if it's really something we want to pursue. I'm still on the side. Can you use it? Can you play with it? Can you try it? Yes. There is a lot of different tools that probably nobody knows or you don't know. The configuration, it can be a little bit tricky. So... Michael? Suggested, why don't you do a mini part that is basically a mini cube? So kind of an instance that is run on one node so you can play with it. So there is... There it is. Mini part is basically a service mesh. What I described before, all these console, normal, traffic components, everything. This is running on a single node. So you can install mini part as a package. There is one script that will change some configuration file on your machine and then you can start it and you have the same environment on your machine. So you can... It's not for production. Not at all because you don't have high availability and for tolerance, of course. It's running only on your machine. But anyone can try. So if you want to give us both, this is the way. There is... a readme page with the instruction to install it. It's not for production. If it's not clear, it's not for production. Questions so far? Don't be shy. I still have five socks. Want to see a demo? You sure? We'll see. So... This is how mini part looks like. I have mini part running basically on my laptop. And what you have, you have a console running on port 8,500. You have nomads running here. And you have traffic. This is the front end. I don't know how to do it, but... it's here. So... how it works, where it is. For example, first time. Yeah, yeah, yeah. Now I'm... So this is a job description file. For a mini part, you specify a data center. Just to know where it can be spawned. Internal abstraction... of abstraction. That's the very hello world example. So basically just define a service called alofaustem. The configuration is... I mean the standard Nginx. We will see the standard page of Nginx. Where to download the image. Which one? Which version? And to port 80 will be mapped off the top side. And that's it. That is the minimum thing that we need. So how it works is we just say nomad run alofaustem.job And the command line tool is now speaking basically with the... with the server. Say, hey, this is the job description. Oh, the resources. So the question was about the resource limitation. Yeah, I missed that point. So basically I'm using CPU set and it's complicated. Theoretically the resources that you specify are used by the orchestrator to understand if your cluster has the capability to run this kind of job. So the CPU is kind of a bogus megahertz. So if your machine is running on two gigahertz and you say that your job needs 100 megahertz it's basically, okay, your machine is two megahertz and it's using like to say five percent something like that but it's bogus somehow and it's used mainly for scheduling and orchestrating. Memory is, I'm using RCTL to enforce memory but it doesn't work like in C groups. RCTL is trying to limit the amount of memory used by your jail but if your jail needs more memory it won't kill your processes. And that is not ideal in general because then you can have people exploiting that fact so you can use basically more memory invading the others. They have these neighbor effects so-called. One thing that you have to work on is an OM killer. So basically trying to monitor the memory consumption at least really is not able to stick to that value. You have to kill it. But those are resources that are with RCTL and CPU set basically they are enforced. Now we have submitted the job, triggered, created, what that means we just look now at the UI here. So we have our first and job one after running and you see there was only one instance. It's running here. That is basically where it's reachable. And then there is the welcome to NGNX. We have the same information here. Basically the service is registered. We will have also some tech because then you can basically subscribe to specific service that has this tech and so on. And what you have here is also basically now console is keeping this information up to date. So you know he knows where it's running. You see there is not 480 but this weird number. And then it's performing continuously checks. So if that disappears that will be marked as red. And basically you can ask the console to give me all the services and give me the services that are in healthy and not running. So it's trying to keep this information always up to date. And then you have also here basically this is traffic. This proxy. It receives the information from console and it had that here a new front end called Hello Foster because that was the name of the service and if you basically curve with this host header it will be already directed to the back end Hello Foster this year and those are totally dynamic. So basically continually asking console oh there is a new service is adding a new front end and is adding then a new back end to serve this service. I see that I have to run I have another demo but yeah I don't have time to show it. It seems nice, seems working but what is the problem? Creating the image is actually an issue. I mean what is the image? How can I create it? How can I provision my jays? We need to provide automation reproducibility that's a heavy topic for me has to be fast. The images should be not gigabytes but reasonably small. The solution should be portable and usable. Portability especially because the current solution is basically flavor. So when you run a pod create to create a pod you can specify flavors that are basically provisioned scripts. So you can say okay if you want engine x boom I have a script that just install engine x. I make engine x an example and then you can have basically multiple flavors that do just a small set of operations. There are two flavors after the box FBZUpdate and Slim. FBZUpdate is running previously updates before anything because how it works when you say okay this is the base 12.1 is the loaded base.txt but that doesn't have any patches so you always have to run a previously update there and this is a work in progress. It tries to reduce the size of the image. The leading, Selang, a lot of files that if you're running web server you don't need. You don't need GDB, you don't need a lot of things so I'm just removing hundreds of megabytes of stuff that in that scenario you don't really need. Two problems. First of all it's not user friendly. It's not really user friendly. It works obviously only on FreeBSD. What is a shell script that use J's? It works only on FreeBSD. What developers use they don't use FreeBSD, period. I have a FreeBSD, I'm the only one in Trivago, we have a thousand people, I'm the only one with FreeBSD laptop. I mean nobody. Few people use FreeBSD on the laptop but it's an issue. And the other thing that is not really nice if you, let's say, oh, new version of NGNX you have to recreate everything from the scratch. So you keep your flavors there basically those booters, but then you have to create everything from the beginning. This is still an issue. The second one is not that bad. It can be better. Pop machine. Establishment again. Why don't you imitate Docker? With Docker machine. So basically if you want to run a Docker container on a Mac, I mean inside Docker there is Linux binaries but they run on a Mac and how it works. Basically what you have is you have to install Docker machine that is installing Linux kernel in a virtual box or in a VM essentially and it's a security Linux code there. So why can't we not do the same? And pot machine is this project to create basically it works on Mac or CX and on Linux. It will use vagrant to spawn a privacy VM where you can execute and run your jails. It extends all the common available on POT so basically you have two, three comments more and every comment that is not special kind of you have pot machine, two or three comments. The other are just forwarded to the FreeBSD VM and you can have your jails running on Linux more or less. So it's an important thing because it's breaking the barrier so everyone can theoretically now play with jails. You don't need a FreeBSD installation to play with jails. The second work in progress is POT file. We are just looking how it goes. So we are trying to imitate Docker file. Instead of having that complicated script that you put there and there you can just have this typical run from whatever. It's experimental. It's not adding really or changing. It's just translating this POT file in a flavor. I mean the flavor is what actually you want to run is the same. It's not user friendly so that's basically a wrapper where people is more used to use to create POT images. I would encourage you to try if you have some issues. He will fix them. But let's say it's a work in progress so we are still evaluating. We don't want to create a complete new format because I don't think it makes a lot of sense. But I mean also Docker file is not the best format ever so you are kind of puzzled what we can do what we shouldn't do. Registry. Easy to be server with those files you put there. So when you do a POT export you create your image and then you have those files. Copy them over. Period. You have to maintain your own registry basically. I don't want to maintain any registry at all. I don't like the idea to download binaries. I don't want to maintain to have the burden in terms of security having binaries floating there and there. But I'll be more and happy to have a Flavors Catalog. You need engineers. This is the way to create engineers so you have those scripts already prepared. There's a very approach similar to what Bastille is doing. They have all these templates where you can just say, okay I want the jail with man-cache. Boom. Done. You don't need really to understand how it's already there. So there you have really a great approach and maybe I can reuse those templates. I don't know how to speak with the Bastille BSD developer. This kind of registry that I showed in the examples is a virtual machine running somewhere. It's a web server with those two files. Period. It's good for examples so you don't have to create your own and so on but it's not for production use. Not at all. But it's just something so people can try. You can try it. You don't need to create a new image and so on. But it's not Docker Hub. You cannot upload things. It's a web server. It's not for production. Wow. Two minutes? Three minutes? We want to do a lot of things. Many were already spoken but I'm running out of time. We want to focus now on more on image creation because that is where people can then use it or at least try to use it because if you have wow, you have your service mesh but you have nothing to run it doesn't make any sense. So there's more or less especially with many ideas but no time to do it. The dual stack support will come. I get no, it's not here. Olivier is talking with him. Good idea to have IPv6 support. Currently everything is on IPv4 but we will have IPv6 support to provide dual stack. So your container can will speak IPv6. Really thank you to be here. Thank you for listening. I will also my contributors. I put nickname because I'm not sure that they want to be, I can write their name but really thanks. Question. Wow, many. And there's another. One. I'm almost over. But I guess I have time for many, how many questions can I, I mean I will. Do you have any ideas about persistent storage? It's relatively easy. It's not managed here. Basically you can go on the machine, you can mount datasets or directories inside the container. So it's up to you what to use but if you have on a specific note you can pin basically a container running on a specific note where you have a dataset and your persistent storage there and then you can mount it inside the jail. It's a feature I didn't show but it's there. Yeah, you specify mount in. You can mount in as I said a folder or ZFS dataset. You read only if it has to be read only only and then you can pin the job directly to a specific machine. I don't see why not. I use PF. I don't know how your cage is using PF. I only think what it's doing is it's using two anchors that I add and that's it. So I don't really pollute too much and then I have my specific dataset under you specify where to put your root where to create all the datasets and if it's different then it should be compatible. I never tried it but keep me posted.