 All right guys, we're starting the last talk of the day if you guys could settle down for great All right This is the last talk of the day. Thanks for sticking around for those who have stuck around This is a talk from Anand and Gilbert both of those guys are engineers at mesosphere and Apache mesos cometers So they're gonna talk about some of the new stuff that happened in mesos in the recent releases Hey, everyone. So This is a nun a nice the maintainer in HTV API and and the default executor and many other stuff in mesos So I'm currently maintaining the containerization in mesos and today we're gonna talk about the Mesos containerization and the default executor So The first question I would like to ask you guys it is What is a container? Anyone want to answer that? Okay, so So there were many people asked me the same questions So previously I would just answer a container is easy and basically it is a Nance base plus C group and when people hear my answer They always said okay, but they still look confusing So so and now I convert the question So when people ask what is a container basically they want to understand why do we need container? so container We use it for operation So basically we have the developer to create a container image and we have the operator to consume Container image to create the isolation execution environment. So that's the reason people would like to use container and For the container containerization of mesos it has been there for years since 2011 so I will briefly introduce the history of the containerization of mesos so In 2011 We have we don't have a containerizer yet. And at that time each container is a process of mesos And then in that process we have the executor which launched the task inside of each the container and we don't have resource any resource isolation at that time yet and in 2012 we Evolved the containerization In mesos, but we still don't have the containerizer and we just basically introduced the sequel and memory isolation to the to the process so that we can control some limit limited amount of the resources rely on the sequel and in 2014 we Introduced the architecture of the mesos containerizer so We have the launcher launcher gonna is the one who control the process to which responsible to launch the process monitor the pit and then Cue each of those processes and we have the isolated isolated basically it is a lifecycle hook which can prepare the environment prepare all the things the container need before the container was launched and We also have different isolator to achieve different isolation. So at that time we have CPU and memory isolated so in 2014 so it is the summer of the 2014 we introduced the containerizer and Depends on the basic support of the container we have the mesos containerizer and on top of that we Introduced another containerizer called dog here containerizer. It is basically used the Docker command line to rely on the Docker demon to launch Docker containers So people might ask why why don't we use run C at that time because there's no runs yet So we just use Docker command line And in 2016 we it's a big change for the unified containerizer and right now we call it Universal container runtime in DCOS. So basically we a Container it's just the Linux feature to use to leverage the sequel and then space. So Why do we why do we still rely on Docker demon? So and then at that time people realized like in production. There are some stability issue with the Docker demon So we decided like why don't we do the Container by our own or mesos containerizer So we decided to support Docker image with our own isolation so we introduced the Linux file system isolator the Docker runtime isolator and Provisioning to support the Docker image. So Basically, it's super easy Just give an image name to mesos and mesos gonna download the Docker image from any registry You specify and then mesos gonna launch that Docker image as a mesos container So this is what we call unified containerizer or the UCR so Right now we support Docker image, but in It's not only for Docker image We also support some other type of the image format like the app see image proposed by the coos and the OCI image spec which gonna land in mesos soon so we start to think about the question like if we have everything rely on Docker and We cannot guarantee Symmetics will never change and sometimes cinematic change. We have some bad work compatibility issue. So In the industry we definitely to embrace some standard So for example for container runtime for image for the container image we have we need the container image standard and for networking and storage we need the corresponding standard for all the industry so That's the reason we decide to support different spec in container during the container runtime and As you guys can see So for the container image we have Peerlessly people have to use Docker demon, but right now Many other Container orchestrator including mesos and some others. They have their own container runtime and Which is just leverage the Linux kernel features and for networking solution Peerlessly people rely on the lead network to support the Docker container networking with the Container network module and for storage people have to use the dvdi to use the Docker volume API to support external storage and It's since it's not ideal to us because we might want to do more and So it is for because we want the interface to be stable and we want all those No matter networking storage or the container on time. They they're all bad work compatible and We want different plug-in for no matter storage or the networking So we want different vendor. They're gonna develop their own plug-in to to to support their own infrastructure With the maize with the container orchestrator. So this these are all the reason we need a standard so For image people would say like oh I Might not rely on the registry the Docker image registry because I might have thousands of machine I don't want all each of machine Download the image from one registry at the same time and then the network the networking gonna be the bottleneck so it might hit some issue and I want some other spec to do the image pooling and then I want to define my any image runtime So and so that's all the reason people would need an image spec and it's similar people would expecting a Specific industry used networking spec or storage spec and I will introduce in the following couple slides So for the image spec people would like to understand like how to package application bit into an image How to package application come fixed into image? so they want to understand like what's the correct way to define the image and In the future when they change the semantics is this deal that were compatible. So those are all the user concern and and then Including some other big company like Google and CoS and including the red hat they They did have the concern about like Will docker change the image spec in the future So if we break my environment, so those are all the concern including Microsoft I believe so and then this big company get together they figure out like oh Maybe we should fire out a new image spec for the industry and this is the OCI why we have the OCI right now and I can see might be Possibly in on one day OCI can be super popular because all the major Container orchestrator all the and the big company focus on the OCI and get rid of the Docker image spec it might be possible. So Because of this concern we decide we should support OCI in mesos and make it compatible with Docker image together and That's what's gonna happen by the end of this year And for the networking Similarly, we have different Scenery we have similar scenario as the image as the as the continuous back. So Obviously in Docker the network interface is kind of complicated because it Was done together with the container runtime, but basically when we define the container runtime, we want the network Specification to be stand-alone. So we want to we want we are expecting user my expecting like I just want to define my Network for one container or a group of container and I don't want it to integrate with any of the runtime I just want to like I define the container network the container have the option to join this network or not and This is the reason like the no matter mesos or Kubernetes or Invest decide made a decision to use the CNI Because the CNI is a very clear Networking interface basically it only has two operation for the container network Attach and detach just attach the network to the container and detach it after you use it and so and then for different Networks solution the networking vendor gonna develop their own plug-in to provide different Usage for the container network. So it makes the container network definition super easy and super clear And as I mentioned, this is the ad and delayed. It is the attach and detach of the container network So if people have some special Architecture or special networking hardware, they can basically they can just develop using the API develop their own Network plug-in using the CNI network plug-in to integrate with the container and The CNI the CNI isolated on mesos. Just do one thing, which is super simple Just clone the net one end space for the container and that's it. So What whatever the rest should be done by the CNI parking. So it's a clear Architecture and make it possible so user can define any of the parking by themselves and the parking is super easy to write like in two hours And Similar to the storage in this I think this talk this morning at 1150 So G introduced the CSI as the latest storage Specification so I will not cover the CSI detail here So basically is similar to the networking just create a volume delay the volume Attach and detach and mount and amount for the volume So people can you guys can can take a look at just talk this morning I guess we have the recording to Get more information about the CSI So and then I gonna just briefly introduce two latest feature on the mesos Continuization. So the first one is the nested container so I'm super happy to talk to a couple folks from Uber and from Verizon they are already using the nested container for production and For the nested container basically we Decide to support nested container in September last year. So we want to the motivation There are many motivations, but basically we want a Group of container can be managed in the same life cycle So that to support any specific application Let's say I have my main application running in the my main container But meanwhile, I am expecting like I have some psycho container to do the backup or To do the logging and many other functionality so this group of container they should have the same life cycle and Some of them die Means the whole the container should be cleaned it up so Depends on this motivation we investigate Into it and we realized like oh we can introduce a hierarchy for containers a Container can be nested inside of another container and then we can have Even couple level of nesting so In mesos nested container we support up to 32 level nested container It is limited by the namespace For example, it's limited by the pit namespace because in kernel level the pit namespace was implemented in the hierarchy way so it is up limited up to 32 level and Right now we have already have some container running in the third level nesting. So Which is we already running in our environment and In nested container we support many features for example we support a volume sharing People user can define a volume and it can be shared by a couple containers From the sibling no matter the container is your sibling or the container is your try container so they can all always share the same volume and We also support sharing Some other resources for example recently we support pit namespace sharing So a container Depends on the configuration we define for mesos in the framework from the framework put above We can have a container share a pit namespace with the other The container have the option to share or not So Yeah, basically this is just a simple What path how we launch a nested container so we rely on the executor which is the default executor and I'm gonna introduce next so the executor gonna talk to the agent to launch a nested container which is inside of the executor container and And then we have a clear put above to allow user to do all those nesting container things So user does not need to do too much just defy your put above from the framework and depends on the nested container we recently we developed the Improved the debug container. So as you guys can see This is the top level container and we have the executor running inside of a container and then we launch an engine X nested container inside of it and And then we realized like oh I want to enter the engine X containers namespace and Debug in debug the engine X in case the user specify something wrong Some wrong configuration for the engine X. So I want to show in or bash in to the engine X so and then I can just Create a debug container by the operator. So operator will just say oh Could you launch and a debug container for me? And then I can just show in and then to do whatever I want. It's similar to the Docker Exeq and attach So just debug any container you want in it in the containers namespaces So yeah, that's the brief introduction to the mesos Continuization and then I think we want enough to introduce the default executor for us so So yeah, hello everyone and I know this is the last talk so just appear with me for like a 21 minutes Can you switch this off? Okay Okay, so yeah, so in this section of the talk we would be mostly a focused on the executor the executor API and more specifically the default executor, which is a new executor that we introduced in a Mesos 1.1 that allows you to launch the task in unlisted containers and by introducing this default executor the lines between a custom executor and Like like the line between implementing your own custom executor and just are using the default executor for most use cases Have not bridged so I guess one motivation for this talk is that once you go over all the features of the default executor It should be a possible for you to deprecate all custom executors that have like overlapping functionality with the A default executor so I guess let's start with the most Obvious a question. What exactly is an executor? so in an executor is up is a Process that is launched by the missus agent to execute your task now an executor can handle one or more than one task meaning there is a one is to end mapping between an executor and a task so There are a couple of ways about how you can go about implementing your own executor. You can either use the old API which is non-standard and It was also based on a lip process Message passing or you can use the new a V1 API which which is also based on HTTP and it's based upon a using adjacent or a protobuf and The nice advantage of this new API is that there is no a native a dependency So now you can actually implement your executor in any language of your choice Provided the language actually have HTTP abstractions meaning they allow you to use a simple HTTP client and I guess the a Recommendation from us for now is to exclusively use the a V1 API because we would be are deprecating the old API soon So what are the types of executors be a support? So currently there are four types of executors that are Supported so the first and I would call it the oldest one is the a command executor So every time you launch a simple a command based a task. Let's say you launch a sleep task with a marathon so so when So under the hood the agent would actually launch something called a command executor which would actually execute your task the command executor is based on the old API and It only supports a launching one task So every time you have to launch a new task a new a command executor would actually be spawned by the MS of agent recently we also introduced a Agent flag called the HTTP command executor that allows you to use the old command executor with the new API The advantage of using this flag is that in the old API? It was a bi-directional meaning the agent used to establish a connection with the executor and the executor used to establish a Connection with the agent so when using the new API It's just one directional meaning the executor is the one that establishes a Connection with the agent while the agent doesn't establish a connection back with the executor So I have found that it was useful for some use cases when a pupil did not want to open any ports or the Agent to the executor a communication the other executor that we Support is the a docker executor and it is again based on the old API and it allows you to launch a one-off a docker containers The third executor is the a custom executor which is not a built-in executor Anyone can implement in executor based on their business needs in either the old API or the new API And it supports both a task and a task group I would explain what a task group means in some time and the executor that we would be focusing most in this talk Is the a default executor which is based on the new a V1 API and it supports a Launching multiple a task group So I guess the first immediate a question is what exactly is a task group and what are the advantages of Using a task group over just using a multiple task so a task group as By the name goes is just a collection of tasks with the nice invariant that All the tasks in the task group would be a delivered atomically to the executor So it has all or are nothing semantics. I would come to what that actually means So I guess the the first question then as I was saying that why it should be used a task group instead of just Relying on tasks, right? So I guess When when we are running most of workloads in a production What usually happens is that we have like one main application Which is the application which has all our business logic, but we also want to run other Sidecar containers or other adapter containers meaning we also want to do a Logging for our main application and we don't want to have that a logic in the main application itself similarly, we don't want to have the Metrics a collection logic in the main application itself and we would like to have some kind of an adapter container another use case for having a task group can be that you You might want to run something like a file a sharing application and all this a group of containers actually want to share the Volumes with the main application The third and the most exciting use case that that most people use a task group or a ports for are that the Life-cycle of all the tasks in a task group are aligned meaning if one task in a task group fails The entire task group is killed So I guess this explains that why would you a prefer task group or a task but But why can't you just use tasks with the old API itself meaning? Why did we have to build a new abstraction called a task group in a Mesos so I guess that had mostly to do with a limitation that we had with the Scheduler and the executor API that didn't allow launching a Group of tasks atomically so the way currently the launch operation works is that a scheduler a signal says Signals an intent to the master that it wants to launch multiple tasks through a launch operation now the master of the agent to launch these tasks one by one through separate a run task a Messages and it might be possible that an agent becomes a partition away from the master and you might a drop all those run tasks and messages or There might be a use case around a let's say a scheduler are received an offer and it actually launched the main application on the agent and by By the time it could actually launch other sidecar containers on the same agent from other a scheduler actually I took all the resources available on the agent So now the scheduler can't launch any more a sidecar containers Without explicitly are reserving a resources so that is how So I guess the fix that I was talking about that we had to introduce this new a task group abstraction in the new API that provides all or nothing API all or nothing a semantics that That allow a task to be explicitly. I would say atomically are delivered up to the executor So that brings me to the default executor and what What a feature does it have so we introduce the default executor in a mess of a 1.1? the The default executor launches all the tasks in a task group as a nested container So let's take an example if your task group is a three tasks the default executor would actually launch a three Unnested containers that is one nested container per task Okay, and as I said earlier all the tasks in a task group actually share the same Unnamespace meaning they share the same a network namespace and a volumes and The other thing that you need to realize is that we currently haven't built a resource isolation for a nested container for MVP so That means that let's take an example that if your executor has a three a CPUs And let's say you have three tasks in your task group and each one has one CPU since there is no isolation between tasks in a task group what might happen is that Any task in your task group can go up to three CPUs and it might actually mean that the other tasks might starve and That is something that we want to eventually fix But it would mean that we need to build a support for hierarchical see groups in a messos Yeah, and as I alluded to earlier that the default executor launches a nested container for every task in the task group Okay, so yeah, so what are the features of the default executor that might overlap with your existing custom executor? So the first one is health checks or I would call probes which are like non interpreting health checks The second feature is authentication and the third feature is a custom kill policies So let's actually go over the a workflow of how the default executor communicates with the agent I guess it's always it's an internal implementation detail of the default executor But it would anyways be a good to know as to what is happening behind the hood, right? so if you see So what is happening is that the missive agent actually launched a new executor and it it is actually awaiting on the Wait pit system call, right? So now upon launch the default executor now sends a subscribe call to the amiss of agent and the request is a Post request made to the API we want executor endpoint and the type of the call is a subscribe Which has the executor and the framework ID associated with the executor Upon a receiving the a subscribe call the agent responds back with a Subscribe event and actually opens a new a persistent a connection meaning all future events from the agent to the executor would actually be Streamed on this a persistent a connection. So as you can see that the response is HTTP 200, okay? And and all of these events are actually wrapped in something called a record. I format Which is just a simple format Which is event length followed by the event itself so in the event here if the type is a subscribed, right? Okay, so after the agent sends the a subscribed event to the executor it then sends the launch group event so launch group event is the actual launch operation that the scheduler initially had sent to the a Master signal in its intent to launch these a task groups, right? So as you can see that the launch group actually has something called a task group inside it Which is just a collection of tasks Now the default executor what it does is for every every task in the task group it would now send a launch a nifted container call to the Agent and now this call is made to the API be one endpoint Which is the operator API endpoint and the type is launched a nifted container and it just has a simple command called sleep okay now after Actually ensuring that the container is launched the Second step that the default executor does is invoke the wait a nifted container call to the amiss of agent So the way I interpret the wait nifted container call it's similar to the wait or the wait pit system call in a Linux in that it's a it's a blocking call meaning the call would only receive a Receive a response after the nifted container has a terminated either successfully or with a or with a Failure and the response would also contain the container status as to why the container actually terminated and usually the default executor sends that back to the Scheduler via a task status updates so that the Scheduler is also able to know as to what happened with the task Right, so in this example Let's say the a default executor had a task group which had tasks one and tasks two And it actually sends a wait nifted container call for every task in the a task group So there would be a to wait a nifted container calls Okay, so now let's move on to The a task group a life cycle with respect to the default executor I guess what it means is that what is the default a termination of policy which is used by the default executor when a Task in a task group a terminates, okay So in this example, let's say you have to a task group task group one and task group two and they both have a two tasks, right? And now let's say the a tough to in task group one exited with one if status code one, so now Of what happens so the invariant Used by the default executor in the default termination policy is that if any task in a task group fails It would actually kill the entire task group So what would happen is that it would go ahead and then also kill the task one, okay? okay, so in a similar win Another invariant that is used by the default in where default termination policy is that if any task in a task group Exits successfully that means that the arrest of the task in the task group are not impacted So in this example, let's say if the task to exits with a zero status code The a task group is still alive meaning a nothing happens with the a task group, okay Now let's say the a task one also died with exit code zero It can a very well had been an exit code one meaning it actually failed Now what happens is that the default executor? Accomits a suicide so the invariant now is that The default executor would come it suicide if it has no more active a task group, okay? So I guess what does this default a termination policy means for us? So if you have a requirement around a running a sidecar containers The a recommendation is to put them in a separate a task group because you don't want your entire main application to die Just because you're a logging sidecar container failed right in some use cases you might but in most use cases you don't So the a recommendation actually is to put them in a separate task group and as I said earlier that as long as you specify the same Executor ID when you are doing a launch operation the agent would ensure that it is a delivered to the correct a default executor instance All right So yeah, so now let's go over all the features that we introduced in the default Executor one by one so the first feature is health checks So currently we allow you to enter the health check a protobuf inside a task info and We are currently support a three type of health checks HTTP a TCP and a commandel checks And again the the default behavior is that if the health checks failed Meaning if the number of failures are more than the max of failures the default executor build kill that Nested container and then the a default a combination policy that I was showing you in the last a couple of slides would a kick in Okay, so yeah, so Going ahead all built-in executors rely on the checker a native library all custom executors are also encouraged to use it and The interesting bit here is that the command health checks are actually implemented using a Debug nested containers so every time you the executor makes a command health check It actually spawns a new default new a debug container Which is nested inside the main container because the command health checks needs to be executed from the same mounts mount namespace as the parent container and We don't have these idiosyncrasies for the HTTP and TCP health checks because the nested containers are all in the same Network namespace, but it's just that the command health checks are a bit harder to implement okay, so I guess a probe for non-interpreting health checks are Pretty similar the only difference between a normal health check and a probe is that they are not interpreted by the executor meaning if Even if they fail, they are just forwarded to the schedule. Okay The second feature that we actually support is executor authentication We actually introduced this feature in a mess of a 1.3 So it's mostly a security feature in which we don't allow a malicious Executor a process to actually a mimic a real executor and then do something bad on you a cluster so it's it just up prevents executor impersonation and Yeah, so as I said earlier that in this particular example You have a malicious up a process that might want to subscribe with the mess of agent using the same framework ID and the executor ID and Executor authentication prevents that So coming on to the last segment We do have some things on the immediate a roadmap that we would like to fix The first one is having support for a custom a termination policy as I alluded to earlier that the default executor has this default termination policy and Some user or some use cases would be to actually go around that meaning instead of killing the entire task group you might want to a restart a task or have some other business logic, so Introducing a custom termination policy would allow you to do that the other bit that we are interested in is having a support for a resource isolation for nested containers and Also executor authentication. We also want to build a support for a custom a secret a generators currently. We only support a JWT a token which we might want to achieve the implementation and yeah as I said earlier all our contributions are Welcome if you see an item that is not in the in the immediate a roadmap that you want to be included just a chat with me or a Vinod Yeah, so in summary the a missile a container Containerization has been stable and a production for a lot of years It makes you immune to all the bugs in the a docker a docker demon It's a pluggable and extensible and we always try to embrace the new container standards And for the second part I try to use the a default executor as often as you can because it will maintain and owned by the community and There are some overlapping use cases with the custom executor that you might want to a duplicate your custom executor Yeah, so that's about it and a thanks a lot for attending the talk. Appreciate any questions Point of clarification who what is responsible for isolation? Is it the agent or is it the executor? So do you want me to offer you or do you want? So the any of the isolator depends on the right now depends on the isolator in the face so the isolator can prepare for whatever the container need before the container launch and after the container was launched and The isolator can update the resources during the container is running so this is useful for useful for the Like the persistent volume for these resources like the memory for memory changes and the CPU resources they are all rely on different isolator and also it can It can also do some isolation some particular isolation during the during the container Update after that continue update the resources, but it wants some actual logic to isolate the environment for the container So basically for most of the feature no matter the networking storage files file system isolation Discorder they can all achieve by an isolator and Isolator module is the most one of the most common way people do customization with mesos and to offer you in one line it's the Isolator a module that are actually a loaded by the agent that actually does the isolation and I'm not the executor Okay, any other questions? Cool. All right. Thanks guys