 Hello, so today we are I'm going to show you about putting trust into OpenStack and I'm from Cloud Scaling I'm Eric Windish. I am an OpenStack developer and I've been building cloud computing things for a very long time Before we really were calling things cloud and doing automation for longer than that and And That's my Twitter handle again So attacking grizzly should be scary we Want to make it so that you right? Packing grizzly should be a scary thing. We don't want to make it easy for you to do it so we want to do this by adding trust to the system and And to put trust in we need to understand what trust is So trust is knowing who your friends are Right trust in computing is not different than trust in real life You have to know who your friends are you need to know how to identify who those friends are You need to be able to know when someone is trying to fool you into saying there's someone else You don't want to intend me for But trust is limited trust is not It's it's not without limits you just you can't trust everyone and You can't trust them completely so you have to know how much you can trust each of those friends And trust is not necessarily encryption. So we will use Cryptography, but we do not need necessarily to have encryption to have trust You do need for instance a message signing But encryption while it can provide the trust that we're looking for is not a requirement for the trust that we're looking for So right now in OpenStack the system is wide open Anyone can send messages anywhere through the RPC messaging box if you can get if you are a System in the data center and you're on the network you can send a message To any of the systems and control them if you are say we have customers and they I Think every person every customer has some sort of out-of-band way to get into the system You have operations people operations people need to be able to manage these systems At some point they are going to be able to get into that network and do things they need to do need to have that for their jobs But what happens when their system is compromised and they're coming through the VPN? What happens when one of your other systems somehow gets compromised? You have these backdoor ways to get in and anyone can send messages in control OpenStack Say for instance compute you can destroy someone else's VMs or create VMs just Because you want to you want to avoid billing controls or constraints or Control networking which is even scarier you can do all kinds of nasty things sending packets where you don't want them going And then there's the impossible things the things that we design that shouldn't happen, but do happen like the hypervisor gets compromised and Well, you know you have the security model that Plans that the VMs can't do certain things But sometimes VMs can do certain things you might have a flaw in your networking model There might be in your virtual switching or your real switching or your access control groups And when those impossible things happen, we need to be able to protect against Against that eventuality so There is an ability right now say to use TLS SSL against rabbit MQ, and it's not enough because When you use that you have trust Between each of the endpoints and the rabbit MQ server which decrypts those messages and then re-encrypts them to the other hose And it's not point-to-point There's no identity trust from one end of the system to another It's only trusting that messages are to and from rabbit MQ and not from any specific system So if we want to implement things like role-based access controls, we can't do that with just a Trust in rabbit and also it's a point of entry to so Somebody could actually do a man in the middle attack there Rabbit MQ is based in Erlang somebody could even actually modify the code without shutting down a service They can just modify it in real time Or in process so we need to have secure messaging from point to point and Again anyone right now can inject a message you have the malicious mallet who is going to send a message Pretending he's Nova scheduler and he can do that right now just create a message put it out there and Nova compute will pick it up and it will process it and it will do something that It shouldn't do based on input from someone it does not trust And it can't verify it and likewise because that can happen You can also have man in the middle of tax you're switching infrastructure or your routing infrastructure to be compromised and someone could actually Remove a message and replace it with another one. So what we want to have is trust we put these locks keys on Nova scheduler and it uses that to sign the message that gets put over the transport and No matter what is in the middle where that message goes over when it gets to the other end We know it came from that scheduler and that Nova compute can trust that no matter over what medium that message arrived that that message in fact did come unmodified from the Nova schedule and When we do this when the malicious mallet tries to inject a message and he doesn't have trust we don't know who he is He can't sign the message. So when we have a secure note on the on the opposite end here the receiver consumer and He receives a message. It's not signed. He doesn't know who it's coming from or if the message is just not from the trusted system We just reject it. It doesn't go anywhere. It's not accepted and That prevents that attack and Likewise, it prevents that man in the middle of that. Those messages cannot be modified They're they're created and they're static and if they're changed it will be detected and We'll stop those things those attacks from occurring. So We need to be able to implement this There's many protocols as many ways of doing this Some good some bad and you hear a lot of suggestions on which way you should or shouldn't do it The the three main algorithms that one would consider our SSL H Mac and RSA SSL Just want to hear a lot But it's a it's complex. So that's Sometimes a good thing when we have the web SSL between endpoints is great because You have all these different clients. You have mobile browsers and you have legacy browsers You have you have Chrome and Firefox and Internet Explorer and they all support different protocols They all have a different implementation. So SSL is complex in order to support this Heterogeneous environment, but an open stack we have a homogeneous environment All of the systems are the same. They're all running the same software. We know what they can and cannot support so we don't need a complex solution and SSL is session based so by being session based It means that we are going to first of all have to maintain state And we're going to have to send messages back and forth to negotiate the communication before we can send that message When we do that It works against the model that we have in Nova, which is a simple messaging pattern Messages go from one plate from A to B If we had a session we have to go a B a B a B Okay, now we can send the original message that was going from A to B and then probably a few more messages to take it all apart so Again overly complex with being session based, but SSL is great because you use PKI Which we want because we need that for our identity Uses encryption, which is not actually really a minus, but it's not a requirement for what we need and It's because it uses PKI we can use the TPM with it good thing So each Mac Simple really simple people say, okay. Well, you know, we have each Mac in Python And it's part of the core library and it works. Well, it's a shared key. So with a shared key First of all, you can make sure that the message is coming from a system with an open stack That is only so good to make sure that that message comes from a system within open stack It does not assure that it's any specific type of machine. It doesn't assure That we can't do key revocation with that if a key system gets compromised We can't just remove it from the pool and stop taking its messages. It's simply You know, it's theoretically simple But then really you find out that to get the security that you want you need to use Diffie helman, which is session-based. So now we're back to What we have with SSL and To get that identity you have to also pair it with PKI But yeah, well, okay, it's signing. So, you know, it's faster than encryption, which is good And it's all we really need So I have circle here or say and it's simple. It's stateless. It's PKI Message signing which is faster than encryption. It's all and again all we need and compatible with the PPM so we kind of get these advantages of Something that we can do of HMAC and SSL without the problems of either So I keep getting asked How are you gonna manage these keys and key management is a fairly complex thing? So I Actually don't think that it has to be as difficult as people suggest. It's This is actually a fairly standard RSA encryption phase Layout you have in a CA You have a service key up in the left hand upper left hand corner You create a signing request and you have a CA sign that To get a certificate. This is exactly what you do today with your SSL certificates that you get from Verisign What will change is that when you are running and Opens that cloud with message signing you will need to have your own CA and not just reply on one from say Verisign or one of these other large certificate authorities You'll have your own CA in your in your cloud deployment that will be the point of authority for the system and Then everything else actually looks a lot like your SSL solution today So how you get the skis out there how you implement the message The key signing and everything would be based on the tools that you use to manage those certificates and SSL today And if you don't have those tools, then you get those as part of your Yes, um You could theoretically, you know use your own right, so I'll get that in a bit so Right, so, you know you'd have to have that whole Distributing those keys sign those keys as part of your distribution as part of your deployment mechanism But again, not really any different than SSL today where we have the same problems so we are going to Sign messages And we're going to change the message format that's in OpenStack now we have a So right now we have the version one down here on the lower left corner. We have Three fields that are sent over the message bus There we have a We want to add a version two of the messaging which will add the time to live in a time stamp We want these anyway, these are things that we are already know we want to have in the open-stack messaging bus So we're going to add those to the version to RPC and Those are important for encryption or for signing we need that to prevent replay attacks So that just because somebody created a message at some point doesn't mean that we can just you know keep repeating that message So with the timestamps we can actually say well this message was sent too long ago There was a max time to live on that message. It is passed and we're no longer going to accept it Unfortunately within that there is a window in which messages could be repeated So it's not perfect, but the only ways to avoid that or to have the session-based things that we Already discussed would not really work out so well for us But also this is pluggable So if you wanted to have say a third-party system like zookeeper or something that made sure that messages were never getting repeated You could plug that in But by default we're just going to assume that Those timestamps are going to be sufficient and there will be a small window in which replays could happen but not pure fraudulent messages and Then we add the signature chain. So the signature chain will be an optional parameter. The parameter will always be there but if you're not using message timing if you are not forcing it then it will be blank it will be a null field and Because it's pluggable because people are going to want to have their own CA mechanisms Some people I was speaking with red hat and they want to red hat has a CA solution and they want to use theirs I think you know HP may even have something You know, I know cloud scaling we're talking about doing our own system as well as part of our distribution So what we want to have is we're going to identify the variant of the signature So someone was saying the message form is actually designed in theory. You could not use RSA You can use something else. You could figure something else out with this. I Think RSA makes sense and what we're going to implement is part of the open stack grizzly patch, but it is not but you could theoretically implement something else here and you would identify that via the variant identifier Then you have the variant specific fields, which are generally going to be a signature and a public key identifier So you can have something like an ASM one or so forth that identifies a public key that you look up in a key server and That way you can verify That certificate of the host so sending the certificate the certificate in every message will be really really heavy So that's why we want to use an identifier and the external CA system there have been some discussions that Some people may want to at least for small deployments for just getting it to work and making it work easy We may want to make it option that you can put the full x509 key public certificate into the message as well, which would Simplify the security attack surface but would also make the messages significantly bulkier and the great thing about making this pluggable is that you Have that flexibility to do it one way or another way and to integrate your own You know as vendors like say pot scaling or red hat or HP could put in their own things And make a part of the distribution and make it easy for the users so that they don't have to worry too much about how this all works so It's a lot of time So I'm actually going to skip a couple slides as some extra slides so one one thing we could do here, too is this is kind of theoretical is to extend the stuff to the database Because the database is also a problem where we can Inject things if you compromise the system such as the API server you can just put things into the database you can read things in the database and Grants are not enough if you look like on stack forge or whatever these you know Server fault you'll find people asking. Oh, how do I do per column? Grants in my sequel and make it secure and you'll find that actually people don't have these answers people don't know how to do Column, you know properly column based access restrictions on the database doing row based is just impossible so Right now you you know Our mallard can inject messages into sequel. He can read things out of sequel. He can do whatever he wants so We can actually do an RPC DB It's something that we're kind of experimenting with we're not promising this for grizzly and And if we did this then we extend trust to a centralized source that does do sequel and We minimize the input and output so you can make a query Based on information you can actually say well this compute node only has actually access to these rows of the database and He only gets the return value of the function and here if you You know have someone faking a message, you know again it gets blocked and you can't access the database of all So not only do we prevent? Someone from actually connecting to the database that we don't want to connect to the database. We can also Limit what comes out of the database even for those that we do trust right so we don't have Unlimited trust we have limited trust only through the things that we want them to have access to So I have plenty of time for questions. So we are hiring we're in San Francisco and I think Thanks question on that last point were you thinking of having the scope that is the the scope that the Trusted principal has access to did you think of having the scope carried in the actual message exchange? In other words you said that that I can Authenticate that this is the trusted entity. That's that's trying to access the database But then you want to limit which data in the database they have access to how did you associate that? Authorization to that scope of data with that user right so I believe the question is If you receive a request say you have a function that's processing in that that sequel alchemy code How do we actually know that it was this? Nova compute instance in that code. How do we get it from the trust mechanism into? the into the sequel layer that Am I understanding am I translating it right okay, so One thing to keep in mind too is so all of these requests carry a context with them Which is has a signed token that comes from Keystone, so it's already actually a Scope that you're referring to is actually already is inside of the message that we're transporting and that has a token from Keystone but that token from Keystone is Even though we that's trusted so we trust that this is coming from an author We may trust that it's coming from authorized user at the very end But that that token could actually be duplicated and attached to a different message where you could do something else, so Combining this with the Keystone. I think gives you what you're requesting Right because yeah, because the the Keystone token would also then be signed right, so yeah, there was a slide on that and Right here So right so what could happen is first first of all if you're using rabbit MQ there is a messaging fabric So I'm gonna repeat the question for people online Have we considered securing the messaging fabric as opposed to end the end end to end and There are a few problems with this one is that you have a central point that encrypts and decrypts all the messages which there is now an unencrypted form and Theoretically, you know somebody could get into that system the rabbit MQ server into the messaging system and actually do a man the middle attack there Additionally, it doesn't give us role-based access controls so I didn't show a lot of role-based access control information here because We're not going to do that in grizzly. We're going to likely do that in the H the following release But this lays a foundation for us to do role-based access controls Right, okay, so the question was if we can use access controls inside rabbit MQ and Yes, you can do things like that, but I've had some conversations with Some of the other people in the security side who are using rabbit MQ and they're been trying to do that and it only kind of works and There are still it's still not as robust as this So additionally cloud scaling is using zero MQ and we're actually doing point-to-point messaging So we don't have a single point of failure. We don't have a single point of intrusion to the system that those messages could then be Compromised the command the middle attack, which would still by the way it still happen in that case So you can do access controls and you can trust rabbit MQ and we're putting a lot of trust in one basket and That host it's still a host. They can get compromised with the model that We're proposing here Even if rabbit MQ was compromised Even if all those access controls in rabbit MQ which by the way are good things to have we can also do that It won't hurt anything to have it as well then Those messages can still not be duplicated. They can still not be replaced and we can still have access controls based on roles from end to end No, no, we should not trust rabbit MQ that is correct So so the question is if you know you compromise a scheduler you can still gain access to the system and For short time. Yes. So the idea is that if a scheduler is compromised and messages are getting inserted into the system that we can detect that and we can revoke that key and we can remove it from the pool we also Would have to link this into an audit trail and then you know see what actually happened and because we have Identity now we can actually see That it was actually the messages coming from that compromised system during that period of time that we knew it was compromised that we can Actually now try to trace that back. Yes. Yes. So every system will have a unique identity So let's say you use the TPM module the TPM The TPM will generate your private key in a chip and you only get back your certificate signing request back For which you generate your certificate. You don't ever get access to your own key You can't just you wouldn't want to just copy a private key around between all your Nova compute machines all your schedulers You would have an identity per host Right. Yeah, right. So again here. So you have well on the on the picture here. We have the You know that the hop style SSL, but it's not end-to-end Okay, right. I mean I mean you can do SSL to rabbit and do signing from endpoint to endpoint Yes, and you can also do access controls on the queues So if you you know want to do that you can do all of that and I'm not saying that's bad I mean Right. So I think that those things will probably be unnecessary with this But they're also available for you to do if you really want to you know chain a bunch of things together Sure, and I think this will be a vendor differentiation where our vendors will be able to Provide working key management systems, and I spoke with the keystone folks So one of the plugins that we discussed in depth would be a keystone based Key management system so we can actually put keys in the keystone and We could do that fairly easily and easy at low cost And by doing that we could have something that works out of the box and is actually very secure It may not be the most scalable Depending on how you feel about scaling keystone, but it's something we could do out of the box for a pretty decent key management solution Well, well the keystone doesn't have to authenticate that node You just query keystone for a public key that public key assigned by a CA that you trust so the message you could actually get that information from keystone without even SSL between you and keystone and The fact you're getting that public key in the clear is fine because it's signed. I think I had another question so, okay Okay, so it's the keys are per host obviously, but you can actually identify it by services So you can actually have a different private key per service per host But those those keys are per host, but they can be per service as well If that makes sense like you wouldn't you wouldn't say have a single private key for all node of compute But you if it serve if a server is running multiple services it could have multiple keys And one for each service No, no, no, no the fact we're using open SSL means that you could Optionally put keys in TPM. So this I'm not saying you have to use it I'm saying like it's an advantage that we have an architecture that it's available to us, right? So right, so we're going to have a certificate revocation list and Arcate, you know the actual architecture for how it's going to work is still kind of pending We're kind of talking here more about you know the message format and so forth that Are going to facilitate those things. So that's an easy extension to what we're doing. Yes Yes right, but Right, but I mean this you know is no different than other you know PKI systems even SSL where you have a very similar thing Right, your web browser needs to have a certificate revocation Support in order to understand that you know some key from some authority was revoked and Yes, yes, which is why we're it's where you know pluggable allowing vendors to provide solutions around it, right? Yeah, exactly. We have you know, you know a more some some vendor can say we have a better CA system than another I mean You know cloud scaling you know has you know our own ideas on how to do this and I know other vendors do as well And this is where we're going to be able to differentiate and provide value add around it Question the back not so much one of the problems is that you know we are doing this over a messaging bus, right? We're not doing this around. I mean do you have a problem with the model that's proposed it has but I Don't have you know the arguments that give you now and we can take them offline Think we're just about at lunchtime. So if there's any last questions So yeah, I don't feel that we're reinventing something because we're using a really simple primitive, right of RSA We're just taking simple RSA and we're putting into messages. It's actually quite simple Open stack is going to have to have configuration and opt-in configuration Variables and so forth to understand the stuff. So we're still going to have to reinvent those plugins to the system so I You know, and we're just using very simple parameters for the Met for the signing I believe the fall team All right, so okay, so if a system is compromised then The private key can be potentially used For a limited amount of time it may even be stolen But if you're if you're using a TPM then the key cannot be stolen It can only be used and you can restrict how it's used or if it can be used with things like SE Linux and so forth that provide access constraints on any user space and Beyond that, you know, we have key revocation so you can take this as this piece out I do think we're out of time. I think lunch is behind you. So thank you very much