 All right, let's get started for the next talk. So this talk is about secrets management in Mesos. This is also a relatively new feature that was introduced in Mesos. So let's talk about what does it mean and what you can do with it. For those who are not there in the previous talk, quick introduction. Apache Mesos Committer and PMC member currently working at Mesosphere, managing the Mesos team. And before that, I was doing something similar at Twitter and also did my PhD. So what's the secret? So secret in our terminology basically means any sensitive information that you want to use in your cluster. These could be things like passwords, or SSH keys, or certificates, or API keys, what have you. You typically need some of these in a secure cluster to be able to access authenticated services. And the most important thing about a secret is that it should only be visible to authorized users, which typically means this is the owner of the secret. The one who created the secret should be the only one who should be able to view it. Anyone else in the system should not be able to access it unless they have some special elevated permissions. So how should we handle secrets in a huge distributed cluster, like a Mesos cluster, when you have thousands of machines and you have frameworks and schedulers and agents and tasks? The biggest things that need to keep in mind are that the time in transit for a secret should be minimal. For example, if a framework is trying to send a secret all the way to the task, it has to go through the master. The master has to go through the agent. The agent then has to give it to the task. That's a lot of potential network hops, which means there's a lot of opportunities for it to get compromised. So you need to minimize how long a secret is visible before it gets used. The next thing is you should try to avoid persisting to the disk the secret if possible. Because even when the app terminates, if the thing is still on disk, there's high potential that it could get compromised if someone gets access to the machine and able to look at the file system, they can see the secret. So you usually should try to avoid persisting it to disk. And as with any secret, you should limit the possibilities of interception. Basically, that means it should be encrypted, probably on the way to the usage. And again, it should have very few hops when it's actually open to interception by someone in the middle. So what are some of the use cases that we would need secrets for in Mesos cluster? One of the biggest use cases is, of course, image pull secrets. A lot of people use Docker images in Mesos cluster to run Docker containers. But a problem arises when you want to download these images from a private registry. When you want to download an Ubuntu or Alpine image from a public Docker registry, it's no authentication needed. You can just do a Docker pull and it's fine. But most organizations that care about security and care about the availability of registry run their own registries in-house. And they're usually configured with some authentication and user credentials. So once you have those in front of your registry, how do you pull your images from that registry? How do you pass the credentials needed to pull the images? The current solution that we have in Mesos is a bit hacky. If you are using Docker registry 1.0 or 2.0, we take a special interest in some of the URIs that you send to us. And we interpret it differently. For example, if you have a URI call that ends with dash Docker config in your URI, Mesos containerizer knows that, OK, this particular URI corresponds to Docker credentials. So it actually downloads them and puts it in the Mesos sandbox and sets a dollar home variable to this so that when Docker comes up and it tries to do a pull, it gets the credentials based on this information. Similarly, if you're using registry 2.0, if you put docker.tada.gz in your URI, Mesos knows that, OK, this is, again, Docker credential that you're using for 2.0. And then it's going to pull that and actually put them under docker slash config.json. So when you run the docker run command, it actually knows to look into this specific location to get credentials. I mean, this is all a bit of work around to address the fact that we don't have a good first-class way of passing a Docker credential, right? What are some of the limitations other than the code smell that we have there is the URIs are accessible to all tasks and users. The URIs themselves are not protected by any credentials. At least we don't support authenticated URIs yet, which means even if you put your credentials in a URI somewhere, someone can actually just grab that URI. It's not very secure. And the credentials are, as we showed, they're downloaded to the sandbox, which means they're available on the file system even after the container terminates. The sandbox's lifecycle is not tied to the container's lifecycle because we want people to be able to debug. So when something goes wrong, we usually keep the sandboxes around until we actually need to garbage-collect them, which means even if the container terminates and it's done its job with the secret, those credentials are still available here in their sandbox. It's not great. And this is with the Docker containerizer. If you have to use Mesos containerizer, the only way that you could pass credentials today was by setting a Docker config flag on the agent and you pass the Docker credentials on the agent. Obviously, this is not great either because, for one, this means operators have to know the credentials that users are using because operators are the ones who typically configure an agent. So they have to know the credentials that users are going to use. And what ends up happening in the situations is that they're only probably going to configure the register with one credential that they're going to set on all the agents. And that is not really per user credential anymore because operators cannot considerably know all those credentials if they use one per user or anything like that. So this is not great either. These are all some workarounds and it gets you to use private registries, but it's not really flexible and not a really fast class way to do things. And then the other use case is, of course, application secrets, like a lot of applications that you run in your cluster typically need to talk to some other services in your cluster. And if those services need some credentials to be able to authenticate the access or authorize the access, these applications need to have access to them. So there is no support at all for something like this so far. What people typically end up doing is they pass the credential information inside the task info labels or data, which is not really secure, especially if you're not using SSL, if you're not configured your message cluster with SSL, the network traffic is all open to snooping, which means anyone can snoop on the network and get your task info and look at the data or labels. And moreover, if you put it in labels, we also expose those in our API endpoints, in operator API endpoints. If you hit slash state or slash task or what not, all the labels of a task are exposed there. So anyone who has access to our operator API endpoints can actually see the secrets there if you put it in the labels. It's not really great. The other solution that people have used is, again, use the same URI hack that we talked about before, put your secrets in a URI, and then messes downloads that secrets to your sandbox, and their application is configured to look into a particular location, some without of band information on what that location is, what that file name is, and what have you, and then uses that to talk to other applications. It has the same limitations that we talked about in the previous slide on what happens if you use URIs to pull secrets. The other solution for this has been, at least in distress enterprise, was to use custom hooks and isolated modules to do some of this stuff, to not expose the secrets and to limit the scope of how long the secrets are available. Disha's enterprise ended up implementing hooks and modules to pull secrets, which is complicated for one, and it's not really reusable if you want to use that same solution with other frameworks or other organizations if you want to build something on your own. So that's not a viable solution either. So the last use case that I want to chat about is executor authentication. As you know, in Mesos, the way you run a task is through an executor. When you try to launch a task, Mesos agent actually launches an executor, and the executor is the one that executes the task, which basically means the executors are the ones that have to connect back to the agent before they accept the task to execute. Until very recently, there is no authentication at all between executors and agents. Any task could come up and tell the agent that it's an executor, and the agent would allow it. So there's no real production if someone is like malicious in a Mesos cluster and tries to spoof an executor. But if you want to allow an executor authentication, let's say we don't have to have this loophole where anyone can claim that an executor, we want to be able to pass some kind of one-time credentials to an executor when we launch them, and then they should be able to register with agent with those credentials provided, and then we authenticate it. So we want to support this case as well, using whatever mechanism we come up with for secrets. And yes, so there is no support for this. There was no support for this either in the V0 API or V1 API that the executor might use to talk to the agent. So lots of use cases for using secrets and no real good solutions existed. So this work was actually around how do we add fast class support for secrets? How do we make secrets so important in Mesos for a lot of security-conscious customers? How do we make it like a fast class primitive? And then how do we integrate with the existing third-party secret stores? We don't want to build a secret store in Mesos because that's not the business we are in. There are lots of people who are very good at it and build really good, secure secret stores like Vault and lots of other open source implementation for secret stores, well thought out implementations. So we just want to be able to integrate with them for your secrets. And we wanted to support both environment-based secrets and file-based secrets. Environment-based secrets are where the secrets are exposed to your task in environment. File-based is where it's exposed as a file in your sandbox that you can access to. So we wanted to support both of those in addition to the image pool secrets that we talked about to fetch images. So again, the solution for this has three parts to it. So we introduced a new concept called secret. It's a fast class port above, fast class primitive that Mesos understands and knows how to deal with it. We also introduced this concept of secret resolver, which is responsible for resolving a secret. And we'll talk about what does it mean to resolve a secret. And this is the interface that is actually modularized so that you can hook into a lot of different third-party secret stores. And then we also introduced a couple of isolators, environment isolator and volume isolator. And these are respectively for your environment-based secrets and file-based secrets. So let's go into some details now. So how does the secret protobuf itself look? So the secret protobuf has two types. So we allow two types of secrets. What we call the first one is what we call the reference-based secret. The second one is what we call a value-based secret. So reference-based secret is, I hope, what everyone uses in an actual production cluster, where you give a reference to a secret and someone else like a resolver resolves it. For example, you could think of this reference as a name in your vault secret store. And you have a vault-based secret resolver that knows, given this name, how to talk to the vault secret store and fetch that secret. So that's the reference-based secret. We also have a key in there, just in case your secret object is actually a hash map. And you don't want to get the whole hash map, but you want a key into it. There's traditional flexibility there that we provided in the reference. The value-based secret is mostly for testing purposes. And we also leverage this for executor authentication. And in this case, the value of the secret is actually inside the protobuf. Clearly, not very secure, only secure if your network is completely encrypted. We make sure we don't expose any of this in the APIs and whatever, so that part is taken care of. But if the framework is using the value-based secret, which I highly do not recommend, it's going to use this value information to directly give the data to the task. Where we do use this is our executor-based authentication, which we'll talk about on why we needed this. And there, because there is no time in transit, it's only the agent that's doing this. We're not too worried about someone snooping something like this. So you could use any of the secrets, but definitely use the reference secret if you're a framework. And then the new interface that we introduced is the secret resolver interface. The interface is actually really simple. You have a resolve function, which takes a secret and returns the value of the secret. And that resolver might actually have to talk to a different backend that you have to get that value out of that secret reference. And this interface is actually modularized. So you can write your own secret resolver module if you want to talk to your in-house secret store. And the architecture looks something like this. We have a secret resolver that the provisioner, which is the component in Meso's agent that is responsible for provisioning images, container images, is going to talk to. And we also have some isolators that can talk to the secret resolvers. So both of them can talk to secret resolver and they pass it the actual secret proto, let's say as a reference. And then the resolver talks to secret store and gets a value and then passes that value to the corresponding clients, whether it's a provisioner or an isolator. So let's look at how we solved the three use cases that we chatted about in the beginning. So for image pull secrets, where you want to get an image from your private registry, what we did was we added this new field in the docker protobuf in the image that we currently have, which is of type secret, it's optional secret config. So now you can set this config in your image, docker image protobuf. The way we interpret it is we're going to use a secret resolver to get the secret value from secret store and we actually decode that as a docker config file. And we either put it in your environment or you put it in your, we don't need to put it in environment. We just interpret this as your docker config and use that information to pull the image. That's pretty much it. We don't store this at all on your disk because it's not needed after you pull the image. So that thing is completely out of the picture. So yeah, if you want to use your own private credentials when pulling docker image, just put the stuff, your credentials in a secret store and just give the difference to it in your docker image and we'll do the rest as long as you have a secret resolver that knows how to pull that. So yeah, so this is an example of how the workflow would look in the task info. When you set the docker image, you set a config to foo and then the provisioner, when it gets that information, it talks to secret resolver and says, hey, I want to resolve this secret foo which talks to a secret store, gets the value for foo, interprets that as a docker config and then talks to docker registry using that configuration, gets it and then provisions the container image and launches the container. Very straightforward. All frameworks have to do is put the stuff in config their credential and of course write a one-time resolver. The next use case was, how do we pass application secrets as environment? So for this, what we did was we extended our environment protobuf. So we had this environment protobuf already in Mesos which will let you put arbitrary environments from the framework to the task. So previously, we only had what we call value-based environment where you can set a key and evaluate both the strings but we extended it right now to actually have the value as a secret. So you could say, set my environment with the key something and value is a secret. You have to fetch that secret from somewhere else and put it in the environment. So we extended the environment protobuf as so. So you could set environment where either the value could be a simple string value or a secret. And if you do that in environment, we will do all the magic to make sure your environment is reflected with the actual secret value. So here's an example. Someone said, okay, task info environment, name is foo and secret reference is bar. So bar is not the actual value they want, it's referencing some secret information. Maybe it's API key or password. And then when the agent gets it, the environment isolator intercepts this task info and then it talks to secret resolver, gets the bar value from secret store and then when the task gets launched, the environment is actually set with foo key and value as bar value. So that's how it's going to work. For file-based secrets, what we did was we also extended the volume protobuf that we currently have where you can set up different sorts of volumes in your container. We extended it so that the source of the volume could be a secret. So previously we could only do like a Docker volume as a source or a sandbox path as a source. Now someone could say the source of my secret is my source of my volume is a secret. What this basically tells Mesos is that someone wants to put that secret as a volume at a container path. Container path is right there in the volume. But the data that they need in that container path is actually in a secret store somewhere. So Mesos needs to pull that and then put it in that container path inside their container so that they can access it. And the nice thing about the way we did it is even for volume-based secrets, we didn't want the things that are mounted in the container sandbox to persist after the container terminates. So the way we implemented this is it's going to be like a tempFS volume mounted into your container. Once you set that stuff in your task info, the secret resolver is called by the volume secret I stated that we added. And the actual contents of that secret are put in a location that you asked for in the volume proto, but it's mounted as a tempFS volume, which means once the container terminates, that volume goes away as well and no one can see it. So that's a nice feature to have. So I'm not going to talk about the exacter-based authentication in this talk, but there was a talk about it in this year's Mesoscon Asia about how we did the exacter authentication. So if you're interested in how we leverage secrets to do agent exacter authentication, I would highly recommend looking at the talk. That's also a pretty neat implementation of using one-time credentials in Mesos to be able to authenticate exacters on demand. So the support for secret has landed in 1.3. So it's been around for a couple of years now already. And we have been using Meso, at least, at Mesosphere in DCS Enterprise for a long time. So it's pretty close to be called stable. We might graduate it in Mesos 1.5 to a stable feature. So you should feel confident that this plumbing all works. Just fine. It's been used in production environments in quite some places already. So that's pretty cool. And we added support for Mesos containerizer for the pulling of images using private credentials. But we haven't yet added support for Docker containerizer. We haven't gotten enough requests for people to do that. If we do have an overwhelming request from the community, we might decide to add it. But the future of Mesos is the Mesos containerizer and not Docker containerizer, which depends on Docker daemon. So we're trying to not make feature parity. We want to have better features than Mesos containerizer, which has better modular implementation. The secret resolver, as I said, is completely modularized. Mesos comes out of the box to understand value-based secrets as pretty straightforward. But reference-based ones are modularized. So you need to implement a module if you want to hook to any of your secret stores. So next up is demo of the file-based secrets and environment-based secrets in action. So I just recorded the video so that I don't have to have any surprises. Let's see if I can lip sync. So this is an example. So we're doing this with DCS just because that's the one that has implemented secret resolvers that we know of. So this example uses DCS. So this is a typical Mesos marathon configuration that we use in DCS for environment-based secret. As you can see, they set an environment here called MySecret. And when they set up the container's information, they actually say that, OK, my environment's secret value should be secret 0. And the source of that is in a secret store at path slash MySecret. And the way we are going to see this all worked is by echoing the environment-available My underscore secret in the app and see that we actually get to see the secret value in the environment. So this is how we are going to test that the secret value has been passed to the environment. So the next one is file-based secret. And for this one, what we did is, again, when you set up the container, we set a new volume whose source is secret. As we said, that's the thing that we support now in Mesos. And the secret contains a secret password as the reference name, but the actual contents of the password will be stored in the secret. And the way we're going to test this is we're going to do an LS in the sandbox. And we see that the secret has actually been mounted in the sandbox as a file under path in the sandbox. And we see that the contents are actually present in the sandbox. Yeah, let's wait for the video to go forward before. So the next step what we're going to do is we're going to take this and actually launch a marathon app with this configuration in DCOS and see it works. So we're going to the DCOS UI. For those of you who are new to DCOS, this is how the DCOS UI looks. And the first thing that we need to do in the DCOS is to actually create the secret in the secret store. So DCOS UI allows you to actually create secrets. We use a vault-based secret store in DCOS. So from the DCOS UI, you can actually configure your secrets. So you go to your secrets page here and you create a new secret. And then let's give it an ID called My Secret and then give it a value. So what happens here is it goes and stores the secret in our vault-based secret store securely. And no one gets to see it unless they are the owner of the secret. So OK, we created the secret first. It's in the secret store. And now we're going to launch an app which is going to test the environment-based secret. So OK, so we are going to use the DCOS CLI. And we are testing both the pod specification and the app specification just to show that it works in both. For the environment-based secret, we're using pod. So we created this app with the definition that we saw before. And as you can see, it's already gone ahead and running. And if you click through that app and you try to see its sandbox logs and see what did it echo as the secret value, you can see that the secret super secret information, which is the value of the secret, was actually given in the environment. We just stored this in the secret store, but it all got pulled directly as environment variable to the container. So it kind of shows you that you can pass the secret information as environment variable. So the next step is we are going to do the file-based secrets now with the app definition instead of pod, just for some variety. And we just add that application in Marathon. And then as you can see, the application is already running. And then if you go to its sandbox and see if that thing is there, you can see that there's a path file in there, which has some contents. So this basically shows that we actually created a path file name in the container. And it's actually mounted in there. And an application can read the path file and see that the contents are there. As you can see, the last bit here actually cats that path information. And you can see the super secret information is there in the path. So that thing has actually got copied into the path inside your container as a tempFS volume. So this kind of shows that that stuff works just fine. So this basically shows that all that wiring and everything works. And we have a first-class implementation of it in DCOS, if you want to check it out. So that's pretty much the demo. Well, let's get back to future work. So as I said, image pool secrets currently only work in the Mesos containerizer. And we would like to add it for Docker containerizer if people really, really want it. More importantly, I think we want to add this support for app C and OCA images that we are going to support in the UCR, the Mesos containerizer going forward. So there's new image formats, all of this. We will pretty easily support image pool secrets. And then the one thing that I alluded to when we're talking about URI fetching is that we do not currently have a mechanism for Mesos to fetch URIs that need authentication, which is a pretty big limitation. The current workaround for this is people have to configure their curls on the machines at the credentials that they want to get URIs. It's not great. You cannot do per user credentials to fetch URIs. So now that we have the secrets mechanism, the first class primitive, we actually want to be able to use that for fetching authenticated URIs as well. So we are going to add that support in the future for URIs. And the secrets are going to be the mechanism where how someone could pass credentials for fetching URIs. So that's something that we are looking forward to hopefully we'll get time to implement it at some point. As per this, it's important for HTTPS, right? You need certificates and stuff that you need to use to get your HTTPS based URIs. And there's no way currently to pass it. It would be nice for people to be able to pass it in the secret. And the Mesos fetcher will use that to fetch the URI. Again, some acknowledgments. This is in a joint work between a lot of people. Greg especially implemented the environment-based secrets, stuff on executor authentication. Kapil did some of the environment-based secrets and also file-based secrets. And lots of others were actually involved in reviewing and design discussions and all of that stuff. So it's a pretty big effort from a lot of these folks, special thanks to all of these folks. That's pretty much it. Again, I've linked some design docs here. If you're interested to read more about this, there's different design docs for each of the different features. So click through them from the sked.org website and read them if you're interested. That's pretty much it. Now I'll take questions. This is for Decius Enterprise. So the question was, for Decius Enterprise, have you considered any backup mechanism for secrets? And when you say backup, do you mean backing up the secret or a backup mechanism to get secreted, the secret store fails for some reason? Re-initialize vault? OK. Yeah, I am not aware of the work regarding that particular aspect of backing up vault or re-initializing it easily instead of having to have operators to reconfigure. It seems pretty unfortunate if they have to, if vault crashes, they have to go and re-initialize it everything. It would be nice if you do automatic backup source, something like that. So I'll be glad to take that feedback to the team and let them know that's something that they should look into. But I am not intimately involved in how the DCS vault integration works. Until this feature landed, they were using the hooks and isolators. But we are slowly having the DCS move to this world as well, to be more first-class world. But that's not going to solve the vault crashing problem. So that needs to be solved separately if the secret store itself is not stable first year. OK. So you're saying, depending on the storage that you pick for vault, the storage could be highly available, which makes vault less susceptible to downtime and stuff. OK. Maybe you guys should talk after the talk and share some lessons, production lessons. Yeah. Yes. So the first question was, which bits work with the Mesos executor and which bits work with the Docker executor? So the application, the environment and file-based secrets, they only work with Mesos executor. And Mesos executor, by that I mean both the command executor and the default executor that can run pods. It won't work with Docker executor, which uses Docker containerizer. So the question was, can the secret resolver be used on the master instead of agents so that you could avoid having to pass the secrets for getting secrets? That's for the secret resolver. I think that was a discussion that we had earlier as well. The way, at least, that we saw in DCS Enterprise was that since it's a module, it gets that information, possibly, through vault itself. And of course, that means that they have to share credentials, or you have to have agent-specific credentials. But resolving it at the master and then passing it is also interesting if you're OK with that transit being secure in an organization. But yeah, that's something that we haven't definitely designed in some sense. So that's an interesting design choice as well. We could provide that flexibility. We need to think about how that plays with the agent resolver, and how does it know that the master is all late and stuff like that. But that's an interesting idea, I would say. Yeah, I don't see why we cannot do that. Yeah, yeah. So the question was, how does the secret resolver doesn't have more context than just as the kid is getting so that it knows which credentials it should use to get the secret? So yes, so that's a limitation of that interface. So the way we did that in DCS Enterprise was we have an authorization module. The authorization module is the one that's doing authorization of lots of things in Mesos. So when a task is launching, when a framework is launching a task with a certain secret information, that thing is authorized even before it goes to secret resolver. So the authorization part of who gets to use which secret is not done by the secret resolver itself. It's actually done at a layer even before secret resolver gets involved. It's done by the authorizer on the master. That's the one that has the task info, which has a lot more context on who is trying to download that secret. And it has the name of the secret. So using those two information bits, it actually uses to authorize it. But of course, that doesn't solve the problem of when you're actually pulling the secret from the secret store, you're still pulling it with the agent's credential and not the user's credential. So that's a limitation of that. So the question was, is implementation for vault secrets is always that open source or not? It's not yet. Yes, so the question was, how confident are we that people are going to implement modules for the secret resolver? Is it out in the open that people could use? I think that's something that we have to wait and see. I hope there's enough interest in the ecosystem for people to share their implementations. One of the things with the secret stores is a lot of organizations have their own custom snowflake secret store implementations. And so it's really hard for them to share something. But if a lot of organizations are standardizing on vault or something, I could imagine, or maybe Netflix or Twitter will come at some point or Apple and say, I don't want to put them in the spot. But if they are all standardizing on vault inside, they could say, OK, we could probably share that module. But until organizations are actually trying to standardize on some of the secret stores that they're going to use, it's kind of hard to see how they would come out with something that they could share. But that's where the industry is moving towards. I hope that people are willing to share this modules with others because that's not really a secret sauce and how to pull it from a vault. It just needs to be implemented in a robust way. You had a question somewhere in the back? OK, go. You had a question? Yeah. So the question was, it's easy to do it in marathon to get secrets. How do we do it in DCS Commons? So DCS Commons is going to also support the secrets in Mesos. I believe it already supports it in the latest release of SDK if you're using Enterprise. Yeah, if you're using open source DCS and you're using SDK, I don't know if they support secrets. They probably don't. But if you're using Enterprise version of SDK, they all support all of the secret protobufs. So the question was, if someone gets root access to the agent node, how would we make it difficult for them to get access to secrets? Are we trying to launch it in a different process or something for the secret fetching part? We haven't thought that far, to be honest. If someone gets into the box, they could actually just go into the container sandboxes and look into it, or they could probably do a PS and look into the environment. So the secret fetching process itself, we could probably make it secure by running some out-of-band process or something that people cannot look into, maybe. But yeah, we haven't really thought in that direction yet when someone gets into the node maliciously, how would they, what can we do to prevent their access? But if there's some examples out there that we could learn from, I'd be happy to look into them and see what we can do there. Cool. Any other questions? So the question was, the authorization model that I had chatted about before answering that question was that available in open source as well, and it's also modularizable in the sense that it could be used within the authorizations. Yes, the authorization module has been in a long time interface in Mesos for like four or five releases now. So that exists in Mesos proper on both master and the agent for a very long time. And both of those are modularized. So you can write your own authorization module to hook into your authorization backend, whether you use LDAP or whatever IAM service you use. You can definitely hook into it. It's totally possible today. Yes, you're saying you would like to see the Docker containerizer have support for, OK, that's one vote for a Docker containerizer. Having support, we need more votes. Two votes, three votes, four votes, five votes, OK. It's a biased sample, but we'll take that. Any other questions? All right, thanks, guys. Thanks for sticking around.