 Hey, folks, welcome, once again, to the conference auditorium at Dev Crown for US. Today, we're going to have a talk about flocks. That's the first layer of the open cloud exchange. Our presenters are Ali Raza, Xumin Chen, Jacob Datesman, and Leo McGann. Please welcome our speakers. Hi, everyone. I'm Ali, and along with Min, Jacob, and Leo, I will be presenting flocks, which is a hardware trading system in OpenStack. All right, so before going into details of flocks, let's take a step back to the previous Davcon, where we talked about the grand vision of open cloud exchange. So what is open cloud exchange? It's a place where multiple stakeholders come together and build a cloud where different stakeholders can contribute in a different way. For example, hardware vendor can donate hardware, and researchers can build new algorithms or products. A developer can build an application, and they can deploy them in the production environment, and they can get insights about their products or their approaches, and also get the real user feedback from these clouds. All right, so MOC is an active open cloud exchange, where we have different hardware vendors, like Intel, Red Hat, and Cisco, who contributed in terms of hardware. And then we have researchers from Harvard, MIT, BU, who are building research projects and products in MOC. We have an OpenStack, OpenShift, ChefStory system, and also ESI that I will explain in detail later, these software and application in production there. We have a lot of hardware. As you can see, 2,500 Intel cores and 1.5-bitabyte of storage and 45-bitabyte of RAM. OK, so to have a functional open cloud exchange, you need a few components. Here we listed all of them. First is elastic secure infrastructure. What do we mean by elastic secure infrastructure is that if a user needs resources and another user has access resources or free resources, they can give access to these resources to the other users who want this. And for open cloud exchange, I always get confused with those here. OK, so open cloud exchange, you need a production OpenStack and ChefStory system and Kubernetes services available. You also need a single sign-on to access all the OpenStack services. As we already talked about that there would be a resource sharing, you want to incentivize people who want to give their resources or temporary use to others. And so you need pricing and billing or incentivizing some way, credit or any system like that. You also need a resource federation between different OpenStack services. And in the end, you also need a system for onboarding and managing users. OK, so out of these goals, we were able to hit some of them during the summer 2019. We implemented elastic secure infrastructure, which allows users to give access to their resources to other users who need them. And then we also built a marketplace where consumers who need resources come and record in terms of a bid or requirement. They submit their requirements that, hey, I want these many nodes with these hardware specs for this much time. And then you match them with the offers that you get through ESI. And then you give access to the people, consumers, to these nodes. And we built both of them as OpenStack services. So our OpenStack is also live. So before going into the details of demo and flocks, we will just want you to get familiarized with some of the terms that we will be using. A user is any person with appropriate keystone authentication or a flocks user or an OpenStack user. Hardware owners are the people who own the nodes in OpenStack. And an offer is a record in the flocks marketplace where a record that comes through the ESI that says that this hardware is available for this much time at this price. Bids is the requirement coming from a user, a consumer, that wants to use the hardware. And contracts are the binding between the bidders and the offerers. And projects are flocks like keystone projects, same multi-tenant, ironic. Right. So during the summer, we wanted to build flocks so that hardware owners can offer access to their nodes. And a consumer can come and specify their needs. And then our flocks will match these bids and offers and then create contracts. And after the contract is in place, the person who won the bid or who got matched, they can access to the nodes using ironic. And we also wanted a web API to make everything functional for the users. And we were able to hit all of these goals during the summer. And we have a live marketplace, which we call flocks, in MOC. We used Horizon Web Graphical User Interface that exposes to the user the flocks API. We deployed all of this in standard open stack using keystone authentication service and Horizon User Interface Plugin. And we made possible the access using ironic node properties and nova filters. While we were implementing, we also took some assumptions. Here are all of them. We assume that there is only one single open stack instance. All the hardware is same, homogeneous pools of bare meta servers. And also all the nodes in the cloud are standard, like they have standard storage and the network facilities. So here is a higher level view of how flocks work. We see that there is a hardware owner. This can be any project owner or anyone who owns some nodes. Let's say they own these Server 1, Server 2, and Server 3 in an open stack. And they want to offer any of these servers up for any other user who wants to use it. So what they will do, they will create an offer and they send it to the marketplace. And they will also pull the ironic configuration. And they will attach it to the offer and they would put it here in the marketplace. So that will go into the offer record. And then a hardware consumer comes in. Let's say it's a researcher who wants to run some algorithm on some number of nodes with particular hardware configurations. So they will specify, I want these many nodes with this much CPU architect or this much memory or storage, whatever they want. That will come here in bits. Then periodically we have a service called Managing Service that would run. And they will see, OK, how many offers I got or how many servers I have that are up for grab. And then they will look at what are the bits, what do people want. And then they will match them. And in case there is a match, it will create a contract and put it in the contract record. And after the contract is in place, this consumer can access these nodes using ironic. Here's a more elaborated example of the same thing. Let's say we have a hardware owner who owns a server, whose node ID is 456. So they will come to the flocks system through OpenStack or Keystone authentication. And let's say first they try to offer some other node that doesn't belong to them. So because this hardware owner owns the node, which is 456, but they are trying to offer 123, the OpenStack service will say, well, you can't offer this because you don't own this node. But if this user try to offer the node they own, OpenStack, our provider service will pull the ironic configuration and create an offer record. And then let's say the consumer comes in. Consumer also logs in using Keystone authentication. And then it creates a bit that I want this resource with these many configurations, with these configurations. Let's say those configurations that the consumer wants matches the offer or 456 server. So in case there is a match, we'll have a contract in place. And then this consumer can access to this node through the ironic. OK. So while all of these transactions are happening, like offers are coming to the Fox marketplace, their bids, contracts are being created, access is being granted, all of this happening in Fox marketplace. And Fox records all of these things. And we are working on a user interface that will show all of these history or reports that will show the summary of whatever happened. So I explained how Fox works and how we implemented. Now, Min will talk about how we implemented Fox as a combination of two services, marketplace and ESI. Thank you, Eli. OK. So when we implemented this, we implemented two Fox open stack services, a provider service that offers resources to the marketplace and also is in charge of giving bidders access to any contract of resources. And then we also have a marketplace service which receives bids offers, matches the bids and offers. And if there's a match, it creates a contract. So one natural question might be, why do we have two different services here? The answer is because, although for the summer, we're assuming both of these reside within the same open stack, in the longer term, our goal is to have individual provider services, each running on separate instances of open stack, all feeding up into a single marketplace service which manages contracts for every single provider in that system. So that's our long-term goal. And that's what we designed around. One other point I want to highlight here is that we are developing these services wholly within the open stack ecosystem. We're using Keystone for authentication. This integrates with Ironik, the Bear Mail service. We're also using open stack Aso libraries, for those libraries being like the generic libraries that open stack asks people to use for open stack services. So I'm going to talk about the services in a second. But first I had to talk about one requirement that we do have, which is a multi-tenant Ironik. So Ironik in open stack is not tenant aware. It doesn't have a concept of this node belongs to this user and this other node belongs to this other user. As far as Ironik is concerned, it's all accessible to whoever administrates the open stack. And that just doesn't work for us. So what we had to do was we had to simulate a multi-tenant Ironik. And fortunately, we found a way to do this that did not involve hacking a bunch of internal Ironik code or anything like that. Our solution is stuff that you can do right now with open stack without changing any of your internal open stack services. And what we did is we took advantage of the fact that Ironik nodes have an attribute that's of properties that's just a dictionary where you put anything you want in it. And so for us, we put in a project inner ID, which matches the owner of that hardware. And the project ID, which is intended to be set to whatever consumer now has access to that node. So on the consumer side, we have a custom NOVA filter that controls provisioning access to the node based on a project. So if you're in prior project day and you try provision an instance, NOVA will only look for bare metal nodes where project ID is set to A. On the flip side, for hardware owners, we limit Ironik API access through OZO files. In short, what that means is that hardware owners cannot use the Ironik API whatsoever. But if they do need to administrate their nodes, we've created custom menstrual workflows which mirror the Ironik API. They do the exact same thing as the Ironik API, except that they use project owner ID to control access. So they'll check that the owner's project matches the project set in project owner ID. So Bill on top of that is our provider service and this allows owners create offers and publishes these offers to the marketplace. There's also a web API for operations on offers and contracts. And there's also a manager service and this manager service runs independently and it's in charge of expiring offers and contracts once the end date is reached. It also needs to be able to grab contracts from the marketplace. And then once that contracts start date is reached, it fulfills the contracts by saying certain Ironik no properties, the project ID. On top of that, there's a marketplace service and this is where consumers create bids for resources. So the marketplace also manages the database of offers, bids and contracts, has a web API for operations on all that stuff and it too has a manager service. This service is in charge of expiring offers, bids and contracts. And there's also a job that's in charge of matching offers with bids. And if an offer and a bid match, it creates a contract. I'm gonna talk in a little bit more detail about the marketplace matcher. So the way the matcher works is that each offer for a node includes information about the node's attributes like the disk, the memory, stuff like that. Bids contain a server query and that server query is an array of J-M-E-S path expressions. You can see two examples there. You can say CPUs has to equal to 32 or local disk has to be greater than 512. So this is a very flexible system. You can do string matching, you can do a whole bunch of stuff and the matcher will match an offer against a bid and if there's a match, it'll create a contract. So the advantage of the system is that it handles current node attributes and then later on if for whatever reason are a marketplace or a provider's size that needs to add more node attributes, this system will handle that. And on the flip side, if you want to limit the choice for a user and how they bid, say like you only want a user to be able to exactly specify the amount of CPUs, then you can just set it in the GUI, say have the GUI for the bid, say you have to specify the number of CPUs and then the GUI will just create a gem as a path expression for you. So it's a very flexible way of doing things. And now Jacob is just going to talk about the GUI that he worked on. Okay, I worked on the Flux user interface which allows hardware owners and users to create offers on their nodes and then create bids and get access to those ironic nodes. It was implemented using Horizon, which is another part of OpenSec, which means you can go to the OpenStack like Horizon web-based UI and it will just be there in one of the tabs. You can see the status on all your ironic nodes. And if you're a hardware owner, you can create offers, you can create bids if you're a consumer and then when those offers and bids are matched into a contract, you can also have a tab for that to list those. Two important things to mention is that it's still under development. So there's gonna be a lot more features which we haven't implemented yet, but for the purposes of this demo, it has all the functionality required. And the other thing is that later, since it's the provider and the marketplace, there's gonna be two plugins. But again, for the demo, they were merged into one plugin. We also added a reporting tab. So as a hardware owner or an admin, you can view the offered time which you have offered of various nodes and you can see how long they were contracted for. And so you can see if the Z score is closer to one, then that means it was used for more of the time. And there's also for consumer and then overall for the marketplace, you can see the marketplace ratio. And that will tell you that again, the closer it is to one, the more time that is being used for the time base that is offered. So looking ahead for flocks, we intend to have flocks deployed over multiple open stack instances. So the vision is to have the marketplace exist on one open stack instance. And then a hardware owner on his own open stack instance with his own pre-configured ironic setup would be able to start up a provider service and be able to connect to the marketplace over an internet connection. And then the marketplace would be able to handle contracts spanning over multiple provider services. So we also intend to have expand the matching system and the offers and bid system. So we want to be able to allow hardware owners, say to offer a hardware over a periodic interval. So they say three hours every day from this time to this time and the offer would persist every day and people can come on and make bids on it on these periodic offers. We also want the marketplace to be able to support requests for network and storage resources. Say a consumer comes on and says that their project is going to need this much bandwidth. We want the marketplace to be able to accommodate for these needs. We want to be, we want to have the reporting system. So the report shown in this demo is only a mock up but we want to be able to actually create it. And we need to account for different errors that might occur, that might stop the contract from being fulfilled such as say a power outage which prevents the hardware owner that prevents a consumer from connecting to the hardware. And the longterm vision is to have this fully deployed on the MOC and the MGH PCC. Fox was designed with universities and their scaling needs for hardware. So at different times a research group may or may not need a lot of hardware a lot of time and we want people to be able to make use of this hardware and not just have it sitting around collecting dust. We also want the marketplace to support real financial transactions where people actually receive some kind of monetary compensation for putting up hardware and people who make bids will have to pay in order to use such hardware. We want to enable organizations to deploy an agent to monitor the marketplace and to act accordingly to view say how the marketplace is changing and how particular hardware is doing. And we want to create a system for social welfare projects. So say a nonprofit organization will be able to consume and make bids onto the marketplace at a reduced price. So be a part of Fox. This is a link to the GitHub. It's an open source project. Feel free to make a pull request to open an issue and get involved. I'm going to hand it over to Min for the demo. So this is a fun part where we see if everything actually works or if it will all crash and burn. That's right. So live demo. So we're starting on a horizon login. So I'm going to log in as a consumer who is hopefully named consumer apparently. So I'm going to follow the demo flow that we talked about in the presentation but there's going to be one or two extra steps which I'll call out just to illustrate the additional points. So this consumer logs in and he's an optimist and they know that there's that this open stack instance has bare metal nodes available. And what they're going to try and do is they're going to try and provision Nova instance on top of one of those bare metal nodes. This is also not quite production hardware if you can't tell, but it's all it's all live. So we're going to create an instance and the consumer is psychic. So he's going to call it the instance fail. Second. Sorry. Not your instance. Create a volume. I'm going to set the image and the flavor to make it to have a provision on top of bare metal. You had to select the bare metal flavor. So we're going to do that. Select that and launch it and see what happens. So the instance creation is scheduled and we wait a few seconds to see what happens. So it is scheduling and it should fail nearly right away. And the error is no valid host was found. There are not enough hosts available. There's a generic Nova error if it tries to schedule an instance and it can't find a place to put it. And the reason is because we're using flock acts and although there are bare metal nodes none of them have been made available to this user. So what does this user need to do? They need to wait for an owner to offer up a node. So I'm going to log out and log back in as the owner. And again, like the owner and the consumer here are both using the same UI. Our goal is to eventually separate out the two UIs because the owner is mainly interacting with the provider service and the consumer is interacting with the marketplace service. So you can see like kind of a mix of functionality that in reality would be separated out. So here is the owner who will go to the flock acts tab. Here you see a whole bunch of nodes. Now this owner actually only owns one of these nodes last one. So normally they probably will not see off for these nodes, but we've put it up here for illustration purposes. What happens to this owner tries to offer, let's say this top node that they don't own for let's say 12, 12, four or create offer. And you see a whole bunch of nothing happens. What really happened behind the scenes is that the provider threw an error saying that you try to offer up a node you don't own. And just to prove that that happened, I'm going to go into the database query. And you can see that there's been no new offer created. So now this owner is going to offer a node that they do own. And we don't want, I'm going to specify a fairly short period I suppose. And then create this offer. And you can right away see that this offer was created because the UI is updated. Now this node is marked as available, meaning that there's an offer out for it. So now that that offer exists, let's go back to the consumer. So the consumer will go to the flock acts tab and go to the bid section. And here you can see the wreckage, the past hopes and dreams, all these expired bids. But now they're going to create a new bid. Let's just make sure it's for a reasonable time, three to four. Well, let's say, yeah, three to four. So here's where the consumer would be able to set like what sort of node you're looking for. So for example, exactly 32 CPUs. And let's say they want their local disk to be over 9,000. Wow. So we're going to create this bid and see what happens. So this bid has been created and it's available. And behind the scenes, I'm going to show you the log file for the flock acts manager. And you can see that there's a periodic job that matches the bids and offers. So we're just going to, I've said to run every 15 seconds. So we're just going to wait a little bit, wait for it to run and see if it matched the bid. So it just ran. So let's go back to the dashboard and see what happened. I'm going to reload. The bid is still available. That means it hasn't matched. It's still open for another to be matched with another offer. And that's because this user is specified over 9,000 disks. That doesn't exist. They'll be sad. So now let's try again. This time, let's specify something a little more reasonable and create the bid. And this bid has been created right up here. So now we're going to go back and look at this log again. Wait for the match to run once more. It just ran and let's see what happens. Now we can see that this bid is more busy, meaning it's been matched. And because it's been matched, there's a contract now that exists for it. This contract which matches the starting and end time of the bid. So what's going to happen now is the provider manager is going to try and grab this contract and make a copy of it for itself. And this is what happens here. And you can see that it's already ran. It's created the contract. And then there's another manager job that fulfills the contract when the start date is passed. And because I said the start date to the past, it's already been fulfilled. This contract has been fulfilled, meaning that the relevant iROG node now has that consumer's project marked in this project ID field. And we should be now be able to provision using that node. So now this consumer is going to go back, attempt to launch a successful instance. So now this instance is launched. And if all goes well, you'll see that Nova has scheduled it and it's going to start doing a whole bunch of Nova things. Networking and stuff like that. Yep, it's built, scheduling. And now it's networking, meaning that it's been able to provision on top of that node. So that concludes our demo. If you guys have any questions, we would be happy to answer them. So I'm passing the mic for the first question, but just as we continue, if anyone else has questions, like we did here, raise your hand and try to move to the end of the aisle if possible. And I'll pass the mic to you. So, very cool. Any thought about doing the same type of thing in an OpenShift environment? OpenShift versus OpenStack, like containerized versus VMs? So, this isn't really much about that. OpenShift will be something that might run on top of OpenStack. This isn't covering that. This is completely about the elastic, secure infrastructure, about being able to manage your own infrastructure, being for owners to offer the resources of for others to use and for consumers to be able to grab them and be able to use them for whatever they want. There is a proposal to expand the project so that it would actually also be able to allocate containers across multiple nodes, yes, in the same kind of way, but we would like the bare metal stuff to work first. Hi, thank you. Is there a plan to also, this whole bidding and offering that would like that to be completely automated? Like I don't want to sign in somewhere and see something. I just want like some little program to take care of that. I just want to switch. Whenever there's space available, offer it up or something like that. Yeah, this is part of what's meant by a more complex offering and bidding system. So, and also the agents that can manage the marketplace. So we want to allow the different office to just be posted periodically and just exist on like a periodic interval if you want or end to be able to make bids on a periodic interval and you would be able to set up like an agent and just kind of automate that process for you. So you don't have to manually go in every day and say I want this available at 3 p.m. Like and then reset it up every day and check if it matched or not and take it down. That a lot of these processes would be automated and the marketplace would be able to match and accommodate for these as well. Sorry, if you couldn't tell I'm also involved in this project. So we actually would like to get two even more sophisticated like full on futures market kind of models, but basically what we're trying to do for this summer was get the groundwork laid and then we can start them. We'll actually have something to build on and I think autonomous agents start to become more interesting and also looking at kind of general purpose schedulers that basically tolerate market fluctuations start getting interesting, especially if you go to Ali's other talk tomorrow which is about serverless and trying to do similar things with functions. So I know that the metal to tenant piece of this is I think partially implemented. Can you describe the state of that in general? So in the latest cycle, they added an owner ID field. Sorry, they added an owner ID field from the project owner ID field. They didn't hook it up to anything yet. So there's interest but maybe not a ton of interest. So this may be something that we will have to try implement ourselves with an ironic to get officially supported but I would like to point out that like once they do that the back end code that needs to change here is not that much. Instead of saying one property on rock we'll be saying a different property on rock and things will just continue to move along smoothly. So whatever changes you decide to do it's very easy for us to adapt to that change on our side. So a quick follow up on that. Right now, if I bid for a machine or a group of machines like this and I get them, what's actually going on behind the scenes? But in order for me the user to have the machines and be able to use them for something what's happening in the background? So what's happening in the background is that there's a property being set in the property dictionary for a rock node that matches that bidder's project. I'm assuming you mean once a contract is fulfilled not when the bid is created. And so when that project is set we also have a NOVA filter and that NOVA filter, when the user tries to provision on top of the experimental nodes this NOVA filter will filter out any node that whose project's ID doesn't match that user's project. Not realizing the mic, I'm sorry. And then NOVA at that point proceeds to reboot the machine in order to give it to the user. Sorry, is that one? Then NOVA at that point proceeds to reboot the machine by ironic in order to hand it off to the user and provision it with whatever. The NOVA, actually NOVA's stuff will do the provisioning. Right, okay. Great project. Thank you for doing all of this. Appreciate it. My question really is if somebody has a cluster of nodes can they use the flock software to manage them and have an internal bid and ask and provisioning setup? Like can we just take the software layer of this and use it? So are you asking if so you have a cluster of nodes presumably in an open stack understood by ironic? You're asking if you can use the software? We just have an Imagine32 nodes powered up. That's all. So you're asking if you could do all of this but not necessarily have like the cost components or anything like that? Yeah, there's no reason why you couldn't. Yeah, so I just would add that like the cost component part of it you somewhat want it for the audit factor but it doesn't actually have to equal money, right? The other thing I wanted to comment on which I don't know if you guys can open the GitHub repo for the documentation. You know, both of the guys over here on the ends who talked about kind of potential enhancements. If you could go file issues against the project kind of talking about where you'd want to see it go those would be really useful in helping us to prioritize and then we can also get outside feedback because right now right it's just a bunch of us thinking about the problem ourselves. The primary driver here is around trying to do kind of academic sharing of hardware but there's a lot more here. I actually think like Russia's case is very interesting where you have a bunch of yahoo's who I got six servers in my basement and I want to kind of allocate them and make five bucks here and there and be able to allocate them to this study project, right? So I just wanted to make that comment. That's the project. Please do file issues.