 Okay, hello, my name is Peter Disneuers. I'm a professor at Northeastern. With me, I've got Jason Hennessey from, he's a scientist at BU and Brent Holden from Red Hat. Jan Holzer from Red Hat's also involved. And we're here, well, I suppose you could say our, the title of our talk is bait and switch, technical directions for mass open cloud. I'm not really sure what the title was listed under is, but it seems to imply that we've done a bunch of stuff that we're actually gonna be describing the plans for. So I apologize if that's a surprise for any of you. Now, first, before we get started, I wanted to ask how many people here are developers? Okay, and how many people here are actually hands-on operating OpenStack implementations? Okay, that's great. That'll help us adjust this a bit, but also it means that some of this may be very relevant to some of you. So, basically, we're all interested in clouds, cloud computing if we're here. Public clouds, private clouds, hybrid clouds. Now, a lot of people say that computing is moving towards public cloud. There's one problem with that. It's, you know, and you see it today if you're working with a private cloud that there can be so much more innovation if you actually have control over the software, the hardware stack, and so on. With large public clouds, you have a single closed provider. Companies aren't able to introduce innovations through it. You have little visibility into it, either for performance or for auditing. You have vendor lock-in. And instead, what we've had this vision that we wanted to talk to you about and we wanted to talk about the design for. We actually have two days from now, there's another talk my colleague, Aran, will be giving about the vision for this, which probably should have come first, but it doesn't. But our vision is that we'll have an open cloud exchange. So, not just a single provider public cloud, but a public cloud that has multiple partners that all can be involved implementing and operating different resources, both hardware and software services, with each of them charging end users through a common mechanism with as much visibility of operational data as is needed with customers able to select and move between services. So basically, we wanna take cloud computing one step further, not just multi-tenant now, but actually, you could call it multi-landlord. So, we've put together a bunch of people who share this vision. We've got the five largest universities in Massachusetts. We've got the Commonwealth of Massachusetts. We have a bunch of industrial partners. We have some input from Oak Ridge National Labs and the University of Tennessee. And we finally, a couple of weeks ago, have announced the Massachusetts Open Cloud Project. The Commonwealth of Massachusetts has pledged $3 million. We have a commitment of $16 million from our industry partners. And we're putting together a cloud, a prototype cloud, to be able to demonstrate this vision and see if we can build something with it. In doing this, we're gonna be taking advantage of the MGHPCC, the Mass Green High Performance Computing Center. Basically, it's a data center in Holyoke, depressed city in the middle of Massachusetts. We've got a 15 megawatt data center. It's got two acres of floor space. It's, you know, as commercial data centers go, it may not be huge, but it's an unprecedented facility for an academic endeavor. It's jointly run by these five universities. And we're going to be using this for staging our early version of the Open Cloud. So with this, I wanted to, I guess, give the floor to Brent to discuss red hats involvement in this. They're one of our very early industry partners and have been extremely helpful. All right, great. Thank you, Peter. So as introduced, my name is Brent Holden. I'm the chief architect for the Hats East region. And our involvement with the Massachusetts Open Cloud has been very strategic from the beginning. Really, what we wanted to prove is that, you know, for a lot of companies working with, you know, these particular vendors, you know, Red Hat is not the only open-sac vendor. We're trying to prove that, you know, collaboration is not just a bullet point on a architecture slide. It is something that we really wanna partner with our users to understand how they're using our software, not just that, but to also help drive newer feature function and help innovate on the platform and really understand the use cases. And because of the innovative things that we're developing on the Massachusetts Open Cloud, it's bringing together a wide swath of users that are bringing really, really interesting use cases that we can build and develop upon. So, you know, with that said, we're also trying to build up our open-sac expertise within Red Hat. We have several solid groups of Red Hat that can help with design and execution services and training and all those other things that come along with vendor support. But really, what we'd want is we want a large environment, which is what the Massachusetts Open Cloud seeks to build, which is multi-vendor integrated where we can get people at Red Hat experience understanding those use cases, understanding how people are using the technology, and then understanding how the underpinnings all fit together. And so that's really why Red Hat wanted to get involved from the beginning, which is to understand, you know, how people are using it, how to get people trained up and help drive the technology in a direction that better suits the larger open-sac community rather than having a developer-centric purview, which is to develop what they think is best rather than not understanding how people are actually using their stuff. So, why do we settle on open-sac? We settle on open-sac because of the vendor agnosticism that open-sac provides. Open-sac is what a lot of people refer to as a cloud platform, a cloud virtualization platform. So it does suit a lot of use cases that the Massachusetts Open Cloud seeks to provide, but open-sac is not the only piece of technology within the Massachusetts Open Cloud. There are multiple things that integrate that we have to work with in developing, which Jason will get into in just a couple of minutes. But really the underpinnings of open-sac, having that multi-vendor ecosystem that we can plug into, and having all those great things that open-sac tries to solve in terms of authorization, in terms of the security behind it, and the multi-tenancy, those are all things that Massachusetts Open Cloud really glommed onto very early on from an early stage. You know, and a lot of that is just driven by the APIs within the architecture allows a lot of different vendors are picking up open-sac technology as an integration point, and so the APIs and having that be our central integration point makes it a very easy choice to settle on Open-sac. So the purpose of our talk is to talk about the requirements of how are we gonna set about this challenge and what sort of challenges did we run into from an architecture perspective. We're gonna describe our needs and what it's gonna take for us to realize this model because this model is not as simple as I first experienced during some of our early meetings. It's not as simple as just laying down open-sac on a few machines and then walking away. You know, it really is the, it's not just multi-tenant on a single platform, but it's really a multi-landlord platform. And what that means to say is that you're having a multi-landlord platform really means that you have these different technology partners that all come to the table bringing their particular technologies and talking about how those integrate and how consumers of that can pick and choose what's best for their use case. So we'll talk about that. And we wanna talk about how we can help people with similar goals. A lot of customers I talked to around Open-sac, they have this vision of trying to bring in other vendors and being able to offer either it's tiered services or services for particular use cases. This is something that Red Hat is interested in driving in the community because it's something I hear about literally from every customer. They're all interested in doing this type of thing. And so we also want to broadcast this to the community because we want you to collaborate with us and we'll have some information at the end on how you can better do that and how you can bring your use cases and how we can better understand those as well. And that's not just Red Hat centric, but that is for the MOC and also helping to drive the larger community. So Open-sac just sitting down and logging into an instance of Horizon in its own particular silo. Like I said, dropping off an instance, that's pretty easy. That's a pretty simple problem to solve. And you talk about these different technology partners dropping in their own technologies. You have these other Open-sac vendors. They're gonna wanna drop in theirs too and you might have multiple silos. But no infrastructure wants to run as a multiple silo. So Open-sac runs very well for these individual or particular silos running with technologies and having its own fine set of endpoints for each. But really the integration between them is pretty weak. So what Open-sac is what we're trying to solve within the Massachusetts Open Cloud is trying to create these federated technologies where you have one consumer endpoint that you can log into and be able to pick and choose what different types of technologies that you wanna be able to run and integrate those based on what best suits your purpose. So with that, I think I'll hand it over to Jason to talk about some of the technical details and architectural challenges. Thank you. So I think what you're gonna notice throughout these slides is that there are a couple of fundamental issues that we're changing how Open-sac has been used historically and most of that revolves around having multiple instances of services. So the first one is how do we find these services? So now if we have lots of different providers of different types of services, where do we go to look for them? How do we buy services from them? How do we negotiate that? And in this example, you can see that we have a budget provider on the far left and then we have a more responsible one in the middle that's probably more suited towards your enterprise workloads and then you might have even specific hardware that you might use. So you could use any combination of these. You might use one or you might use all three and what you're gonna see is that there's different ways to compose these. So we're planning to expose these through a service directory. So we can even just take the default Keystone API and expose different aspects of these services. So SLAs, different hardware features and with this you can actually build a UI. So this is a demo UI we put together. You can see that there are different ways to sort it. There are different ways you can mix and match these services. So imagine now buying an over from one provider and then storing your block storage with another. You can also buy virtual machine images and all sorts of stuff. Okay so now that you have multiple service providers, one of the questions is how do we address these from within particular services? So one of the fundamental assumptions of these services has been historically that everything is within one provider. Well now we're changing that assumption. So now you can have multiple providers. So in this example we have a single Nova service that's hosting two different virtual machines that are backed by two different block storage providers. So in the current scheme there's a single UUID and there's no way to currently indicate that UUID actually lives in a different place because there's always been just one place to look. So what we're proposing is to introduce fully qualified names. So not just a UUID but also to have a place for that UUID lives, where is its endpoint? So for instance you could differentiate between two different senders and you can have different versions of those endpoints operating at the same time. So another issue because we've historically had the same service is that are there internal APIs that we're using that aren't ready to be exposed outside of a particular open stack installation? So here courtesy of Rackspace we have an architecture diagram that describes how Nova and sender operate together. And as you can see there might be internal APIs such as the Metsch's bus that isn't documented like the external APIs. Now if you're going to separate senders and Novas and have them interoperate between providers we need to document these and make them more accessible and have better security and all these things that may not be there right now. So in addition to that we also have some trust issues. So for instance if I have two different Novas maybe from different providers such as the one on the left which is a budget provider and a more established one but I want to point them to the same storage currently in the authorization scheme once I give access to one of them I give access to both of them. So there's no way to say you only the red VM only has access to the red storage and the blue VM only has access to the blue storage. So it's possible for somebody now that they've been given access from the user's token to access all of the storage within there. So the question is how can we develop a mechanism so that you can give access to the red storage without giving access to the blue. And we have some ideas on this and we'd love to hear from you guys if you guys have thoughts on this because this is something we're working on currently. There are also several, they're also underlying protocols. So for instance there's an assumption that you have a storage network in a current open stack deployment. So if you have isolated storage network that's great but now what happens when you start going between providers because you can have a separate storage provider from your Nova provider. So what do you do? Most of the protocols that we are aware of are unencrypted. So are there ways to secure this storage between providers? One mechanism could be to somehow encrypt it or potentially use VLANs if you're in the same data center. But that's one of the things that we're looking into right now. Another issue now if we have multiple providers is actually getting L2 networking to work between those providers. We want this to work seamlessly just like it does right now in open stack today. So how do we do this? Well there are currently third party extensions but one of the things with the open cloud is that we don't wanna dictate how people do their services. We want to let them set it up any way they want really with a lot of freedom and so we don't wanna tell them they have to use this one particular third party driver. So how can we enable that kind of interoperability? That's one of the good questions that we have today. So another business model that we're trying to introduce with the open cloud is how can we share hardware? So if you wanna set up your own open stack provider, you can, the idea would be that you can actually lease some hardware and set up anything you want on it. So if I have for instance a reliable guy in yellow who's reliable because he wears a tie and a grad student who wants to use some hardware, how can I make it so they can't interact with each other? So there's some issues right now if you have, if they're all connected to the same network. So for instance, if you have Pixie booting and you're on the same management network then anybody can Pixie boot anybody else's machines. Same thing with IPMI. How can we come up with ways of isolating these machines so that you have, so that you can have a secure environment? So what we're proposing is to introduce network isolation. So the idea would be that you can use VLANs or SDN to actually segment these machines off from each other. And that way you can actually share these. The idea, now that you can have isolation you can do things like dynamically growing and shrinking your open stack clusters or maybe you wanna use bare metal for HPC jobs. The idea here is that we wanna enable basically anything that people wanna do. And so in order to implement this we've started a project called the hardware as a service. And the idea here is that this is just an allocation and isolation of a layer. We don't dictate how you do imaging. You can use ironic if you wanna do that. But the idea here is that you can use whatever you might have already for your imaging solution or you can implement, you can use ironic or all sorts of stuff. The idea here is that you can extend the marketplace not just for services of the virtual layer or the storage layer but you can also extend the services to be operating at the machine layer. This allows you to mix and match experimental workloads with production workloads side by side without having to worry about them interfering with each other. So if there's one thing you take away from this presentation as far as the design is open stack goes is that we need to figure out ways of making it so that not only ours at multi-tenant but also you have multi-landlord as in you have multiple providers. And that assumption needs to be baked into future decisions that are made. Now if you would like to get in contact with us we have a website, we have an IRC channel, we have a GitHub account and with that we'd love to hear from you guys with any questions or comments. Since you're doing the high performance computing, I mean I run open stack in HPC environment. The problem I have is that since we are using gigabit our MPI jobs don't scale. So do you have any plans to run it through an infinite networking or anything like that? Currently that in fact is one of the reasons why we're interested in the hardware as a service system that we're trying to deploy because we do have existing HPC infrastructures with Infiniband for instance but that aren't through open stack and at this point you know they're not these researchers aren't interested in spending a lot of time rewriting everything to go with open stack or to take the hit of going to gigabit or even 10 gig ethernet. So what we'd be doing for much of this in the near term is partitioning off the supercomputing cluster and using other machines for open stack and in particular one model we're very interested in is flexibly moving that boundary back and forth so that as we have higher open stack workloads perhaps during the day when people are spinning up VMs or something we can steal cycles from the HPC applications and then we have physicists who basically they could use a thousand machines for a year and that would make them happy. Well no, it would almost make them happy. So we always can use them to suck up any cycles and boost the efficiency numbers really high when we're not actually using the cloud resources. So I hope that answers some of your question. We're sidestepping the Infiniband open stack issues at first but for many of our users it will be important as we go forward. Really good presentation guys. And I came in a little late so I missed that. Are you guys looking at supporting in your cloud hypervisors not just KVM but some of the others and some of the considerations you have on that in terms of KVM versus WeSphere and... Most definitely. I mean we have strong historic links to VMware. In fact that's where many of us met before being in academia. We definitely, you can't be in academic research without stumbling across someone who has an interest in a different hypervisor. At this point clearly the most straightforward way to set up computing for people who just wanna run Hadoop to get their experiments done. Just wanna run burst to cloud instances again to get their experiments done. KVM seems the most straightforward approach right now but we're certainly very interested in experimenting in these trying different ways of setting up this cloud. And that's one of the goals here is to have the ability to do that. To set up one fraction this way, some other another. It may turn out that one hypervisor is better for one workload and a different hypervisor for another. And so that's one of the options you would be choosing. There's also something to be said about even if you are using the same hypervisor that the same hypervisor setup differently can provide different skills or different capabilities. So if you think about HPC workload personally as we're set up now, you'd wanna look at like a GP GPU setup where that's hardware that you'd expose up to your application for parallel computing. And so deploying on top of like a KVM you can deploy a very simple stripped down KVM or you could have one where it exposes specifically GPU GPU capabilities based on what the underlying hardware is. You had a slide where you had a problem with securing access to storage, I believe. Have you considered using ACLs storage the previous there? Is it this, yes, let's go. So here, perhaps we should, this usually takes a few times to explain. So I apologize, we'll give it a second try. The problem here is that basically that today the cinder at the bottom, Alice's archives, checks and says well Joe is allowed to use the pink volume and Joe is allowed to use the red volume. And it doesn't know, so that is the current way in which those ACLs are structured. Clearly there's extensions to what's there today could definitely solve this problem. It's not, we're not talking about major new discovery needed but the problem today is again, if Bob goes rogue he can go and see all the other resources owned by Joe. Because today, I mean there's actually a new product out there, some new research that may change this but the virtualization solutions today allow your compute provider to subvert your VM. You can't get around that. And so, you know, you may trust Robert and you're using Bob for your cheap stuff but you don't really trust him. Or he's your grad student doing your experimental setup. For the VM, you basically don't trust Robert and the ACL support today is not good enough for you. It sounds like. What about using another thing I'm thinking about is the isolation using volume types. I think that could probably help you out a little bit. Probably, I mean there's a bunch of possibilities here. One issue, again, the first response many people have is projects but you end up sort of multiplying projects and having to keep multiple projects in sync with each other. So, you know, I'm not, I don't think we're able to say at this point that which will or won't work just that, you know, clearly there is an issue today. So I'm a little confused about something. You're trying to create this incredibly open world of not only open landlords but open tenants. But yet, the whole talk is about how can you close things down and secure it. The only way that you could have, in the same way that the only way you could have multi-tenants within a cloud environment is with far higher security between them than you have in a more trusting, you know, for instance, your older enterprise computing environment. Say with, you know, multiple users on VMS or something. So, in other words, every, there's a bunch of things that work very easily when you have one boss, one organization, the ability to fire someone if they do something wrong. One cost center that any mistakes get built to. The moment you start having multiple owners, you know, they may have a fair amount of trust in each other, but when you come down to it, no one's gonna pay for someone else's mistakes. You know, no one, you know, when you get to some of us doing research, well, you know, when it's three in the morning, we don't trust our grad students. So, you know, I think that this is something where to open something up, you need interfaces that are, you know, you need strong interfaces and you need strong security because you, the way to open something like this up is to allow people who don't necessarily trust each other to still work together. One point I would wanna make is that the security isolation that's built within OpenStack, as it's well-defined with an OpenStack, there's really not an easy way to mix and match different plugins of all these different projects. There's no isolation structure contained within them as it stands today, and so that's something that we're trying to build out because to Peter's point, you know, giving everybody access to everything and allowing people to step on each other's resources would be a major problem and I know that security is a prime concern for at least a lot of Red Hat's customers. All right. And if I could just throw in one thought, if I could just throw in one thought there, the surface I've seen, the number one reason that people don't use Clouds is because of security. They don't trust the Cloud to hold their data and we're really hoping that this open Cloud concept takes off, we want people to copy it. And if that's gonna happen, you have to ensure that there's security baked into the layers because that's the only way that people are going to trust it. I was curious about hardware as a service. Could you guys elaborate on that a little bit and how it relates to things like triple O and Ironic and stuff like that? Okay, it is very similar to Ironic. It is, in fact, Ironic to say, but we have similar goals where the key difference is that we have, for better or worse, we have a legacy base. We have a bunch of people who already have deployment mechanisms and they don't want to go and rewrite their working deployment mechanisms to work with Ironic. And so what we believe that by doing, in a sense, a lighter weight version of Ironic, it can cover those applications. You have something where, for instance, you could give someone a pool of hardware and they could install a fuel master on one node and then use the fuel deployment across the other nodes just the way that it normally works on bare machines. If someone has an HPC cluster, a high performance computing cluster that is already working with a network group mechanism, then they can deploy that within the machines they've been allocated. So that's really, and for people like me, for researchers in operating systems, it's also key that we be able to do some things like that. In fact, I would like to be able to experiment with different ways of loading up those machines and so I want to be able to get some machines from this pool and do those low-level experiments. So that's where it differs from Ironic. At the same time, there's a lot of what Ironic is doing where there's crossover and we're hoping to use some of what they've done and also contribute to what they're doing. If I could add a thought. So Ironic really solves a much bigger problem than hardware as a service. When I talked about being multi-landlord, all these technology partners are gonna want to bring in their own specific technologies and what Ironic is trying to do is subsume not only a configuration management but also a provisioning system and a lot of these technology partners are gonna retain their own provisioning systems be it because they want to use it or because it's legacy for them. And so we needed a way to really add a lowest common denominator across all these provisioning systems and hardware as a service does that. Okay, well. One more. One more. Hi, Trevor from RMS. I think this is a really cool idea. I'm curious what you guys' thoughts are around the usability of it. You know, you can go to one extreme where everything's done one way like or you go to the other which is really customizable which is what this sounds like. So I'm wondering from the end users, you know, how do you perceive that they'll use this when it comes to selecting all the things that they want? How's the billing gonna look if you can tweak your whole stack and you know, 20 different metrics and from 20 different potential companies to building you 20 different ways and that sort of thing. Well, the short answer would be, we're working on that. The longer answer, I mean, we do see that, you know, our goal is for the operator of this, the mass open cloud to become a separate nonprofit. Currently it's actually being run as a project through Boston University that you know, would in fact be responsible for you know, providing the billing to users would be supported by fees to the providers. You know, clearly there is a tension between usability and how expressive your APIs are underneath but you know, we write programs. Actually, Iran has something to say here, I believe. Part that's not in this talk is that in any complex marketplace, it looks complicated. That was one of our concerns about building a multi provider model. You know, Amazon, you get a VM, you know exactly what you got, right? But this becomes complicated. There's all kinds of different offerings with different SLAs. Obviously users can't use this. The idea is that just like any other complex marketplace, there's gonna be intermediaries, big data platforms that have the time and investment to pick the right infrastructure between all the different things to meet their needs. In fact, on previous projects we worked on, what we ended up building was incredibly complicated, the vCloud director, incredibly complicated constraint engines that handle affinity and anti affinity and locality to data and regulatory constraints and the list kind of went on and on and was trying to solve the problem at the infrastructure level without any visibility into the applications going on. And the model here is, let's expose the pools of capacity with their different characteristics and then let intermediaries go and actually optimize for themselves across it. Basically, we're programmers, we can write code to make things simpler. And if our code isn't good enough, here someone else can write a better piece of code and you can use theirs instead of ours. So, that's it. That's it, thank you very much. Thank you. And...