 Hi, my name is Tsu Min Chen, and I'm a Principal Software Engineer at Red Hat. For the past year or so, I've been working on the Elastic Secure Infrastructure Project, or ESI, along with other Red Hat engineers, as well as people from the Massachusetts OpenCloud or MOC. So what is ESI doing? Well, from our README, we want to create a set of services to permit multiple tenants to flexibly allocate bare-mail machines from a pool of available hardware, create networks, attach bare-mail nodes and networks, and to optionally provision an operating system on those systems. One key thing to note is that ESI does not compel users to a specific service architecture. You can put the pieces together yourself to support your own architectures and project goals. That's your goal, but what is ESI actually doing? To explain that, I like to think of ESI as more of an enablement initiative that aims to fill the gaps in OpenStack in order to fulfill its goals. So we'll work upstream with established OpenStack projects to expand feature sets, we'll write custom code where needed, and we'll provide guidance and support to other groups working in this space. The best way to illustrate the sort of work that we've been doing is to talk about an ESI project that we're currently working on in the MOC. That project is an internal hardware leasing system. The goal is to replace the MOC's current internal hardware leasing system that uses complex custom scripts that are powerful but difficult to maintain or extend. The idea behind the ESI version of this hardware leasing system is that users contribute their own hardware to the bare-mail pool. Those node owners have exclusive use of their nodes. However, when the node owner is not using their node, they can put them up for lease, allowing a lessee to temporarily claim the node and use it for themselves. At the high level, this is what the hardware leasing system's service architecture looks like. In the middle of the DAW is the bare-mail service where we keep our inventory of nodes. A leasing workflow allows owners to offer up their nodes while lessees lease nodes. Then, a provisioning workflow allows both owners and lessees to perform provisioning actions. They can provision their nodes or configure node networking. Let's dig a little deeper into the hardware leasing system workflows. We'll start with the creation of offers. Let's say that the bare-mail inventory has five nodes, A through E, with nodes A, B, and C, owned by owner 01, and nodes D and E, owned by owner 02. Owner 01 is not using any of their nodes, so they offered them up a leasing service by creating an offer for each of their nodes. Owner 02 isn't using node D, so they offered it up as well. However, 02 is using node E, so they keep it for themselves. Now, the lessees come into play. Lessee 01 wants to use two nodes, so they look at the available offers and create a contract for nodes B and D. Lessee 02 only needs one node, so they create a contract for node C. No one creates a contract for node A, so another lessee can take it. And no one can create a contract for node D, since it was never offered up in the leasing service. Once the nodes are leased, the lessee can move into the provisioning workflow. Lessee 01 can use the two nodes they've leased, nodes B and D, in conjunction with the public network and the two private networks that they've created. They can perform a variety of actions, provision a node, attach networks to their nodes, and network actions will be reflected on the switch. Lessee 02 can do similar things, but they're limited to node C and the public network. So that's what we want to do. What steps have we taken to get there? To answer that, let's look at the partial requirements breakdown from the Halcyon days back in 2019. It's not a complete requirements list, but it illustrates a lot of the work that we've done. Let's start with the bare middle service, or Ironic. We need to be able to add and remove nodes from inventory, something Ironic already does very well. We also need the ability to assign owners and lessees to a node, and for those owners and lessees to be able to perform limited API actions. Combined, this translates into nodes multi-tenancy, and it's something that was not supported by Ironic when we started working on the hardware leasing system, and it ended up being something we implemented ourselves. Next are the network requirements. We want users to be able to create their own networks and to potentially share those networks with another project, things that Neutron already supported very well. The MLC also needed ML2 Ansible networking to support the Cisco Nexus switch, and we discovered that was something we need to code ourselves. Finally, we needed the ability to create and manage trunk ports. After some investigation, we learned that Neutron did most of what we needed, but there were some pieces that were missing. The next requirement is a bit of a strange one. Simplify CLI commands. We'll talk about this more in a moment, but the gist of the issue is that the default OpenStack CLI is comprehensive, but not always easy to use. So we decided we would need to give our users some simplified CLI commands. Finally, there's releasing service, which we created from scratch, and needed to allow node owners to offer nodes and for lessees to contract a node for a given time period. Let's go into more detail about the work we did. We'll start with Ironic node multi-tenancy, which was the key feature we needed above all else. We started by talking with the upstream Ironic team about our requirements, and they were amazing at walking us through our ideas and implementation, helping us refine our spec and giving us feedback about our code. The implementation steps started with adding owner and lessee fields to nodes. Ironic actually already had an owner field for nodes, but it was purely informational. Next, we exposed those owner and lessee fields to policy. OpenStack policies determine which users have access to which API actions and can be specified in an easy to update configuration file. Before our work, all Ironic API actions were only accessible to Ironic administrators. We added two new rules, isNodeOwner and isNodeLessee, that determine whether the user making an API request is the owner of the node or the lessee of the node. With those in place, we can now update the other policy rules. For example, we can update baremail node update to allow access if the user is an admin or the owner of the node, and we can update baremail node setPowerState to allow access if the user is an admin or the owner of the node or the lessee of the node. Note that these policy rules are fully customizable by each individual OpenStack installation. There were a few additional minor tweaks necessary, but that was mostly it. We tested our code with single Ironic CLI commands, and then we tried something more complicated, using Mel Smith, which is a client-side Python library for provisioning Ironic nodes. What we discovered was that Mel Smith just worked for node owners and lessees with our code. Owners and lessees could provision nodes that they owned or leased, and cannot do so for nodes that they did not own or lease, which was exactly what we wanted. Onto networking. We started by adding support for the Cisco Nexus switch into ML2 Ansible networking with guidance from the upstream networking Ansible team. They were great at helping us understand what we needed to do, and helping us merge our code, and that was that. Next, we tested trunk port support. We discovered that Neutron already supported the creation and deletion of trunk ports. However, we couldn't attach a trunk port to a specific NIC, so that's a feature we added, actually in Ironic. Lastly, we found out that updating a trunk port didn't fully work. If a trunk port was attached to a NIC and then updated, the change would not be reflected on the switch. So we made plans to implement this feature, and we got around to working on it. We discovered that someone had been less to the punch by a week or two, which was great. Simplified user commands. Why? Well, because if you use the default open stack CLI, figuring out how networks are attached to a single node requires two plus two times the number of NICs commands. So six commands, if your node has two NICs. You also have to use the output from one command as input for the next copying and pasting new UIDs. Then you still have to collate the results of all those commands. But what we learned was that it's actually really easy to extend the open stack CLI. No one on our team have much experience with the CLI code, but within a few days we have developed open stack ESI node network list, a single command that produces the output you see here. We've created additional CLI commands to simplify various operations and you'll see a lot of them in the moment in our demo. Finally, the leasing service. We started by looking at open stack blazer, which is open stacks reservation service. However, at the time that we looked, blazer did not have full support for provisioning ironic nodes. It was also tied closely to Nova, something we were trying to avoid as ironic can provision nodes on its own, and we want as simple of the service architecture as possible. Lastly, blazer did not support policy place node API access. You might be able to provision an ironic node, but you couldn't allow a less see the power second node. So last summer, with the help of a bunch of red hat interns, someone who are now full time red hat engineers working on ESI, we developed ESI leap, which is a pretty simple alternative. It allows resource owners to offer their resources and resource less sees to contract and offered resource for a specified time period. For ironic nodes, contracting a node just means saying the notes that see attribute. ESI leap supports the leasing of ironic nodes good book and easily be extended to work with other resources. Let's take one more look at the service architecture of our hardware leasing system, and hopefully you have a clearer picture of of how all the pieces interact. One thing I really want to emphasize now is how disconnected our leasing and provisioning workflows are. Maybe you don't need a concept of less sees, or maybe you're fine with owners assigning less sees themselves. You can do whatever you want there, and still take advantage of all our work in the provisioning space with ironic node multi tendency and simplify CLI commands. Okay, let's move on to our demos. First, we'll see a demo of our leasing service and ironic multi tenant node capabilities. We're going to start with three users, each with a different view into ironic snowed inventory. The admin user sees every node in the inventory. The owner sees the three nodes that they own. And the less see has not yet these any nodes. So they see no nodes at all. Now the owner has decided that they'll share Dell to so they create an offer for the node by specifying the ironic node resource type. And the ironic nodes, you ID. Note that the owner can also specify a start time and an end time for the offer, but we'll just specify an end time. So the offer will be immediately available. Let's see you can see the offer. And when it's available. They can now create a contract for that node. After the contract is created. Note that the availabilities for the offer are also updated. Other less sees can still contracted the node for those times. When we reached the contract start date, the manager service will fulfill the contract. By saying the nodes lessy field to the lessies project. When the contracts expiration date is reached, the manager service will also on set the nodes lessy field and clean the node. Now that the contract has been fulfilled. The lessy has access to the node. They can also perform specific actions as dictated by ironic policy. For example, our ironic policy allows lessies to power nodes on and off, but not to set or on set node attributes. Here are our ironic policy changes. This is the generic node update rule, which allows API access to only admins and node owners. And here's a set power state role, which allows API access to admins, node owners and no lessies. This ironic policy is restrictive, but still allows lessies to do pretty complex things. For example, using mail Smith to provision a node. This operation takes a while. So we'll skip ahead in time. Provisioning is now complete. And the lessy can now lock into their provision node. That concludes this demo. Next, we're going to look at our custom open stack CLI commands. The base open stack CLI is powerful and comprehensive, but often requires a user to run multiple commands. When I run an ESICLI command, I'll also tell you how many open stack commands will have been needed for the same result. We'll start with a node owner who can see the three nodes that they own. Let's look at how neutron networks are attached to these nodes by looking at the bare metal nodes and ports. Looking at the spare metal port, we can see that this corresponds to this node you ID, which is still to, and there's no information in internal info. So you can see that this bare metal port is not attached to any network. Actually, we can also use an ESICLI command to get the same sort of result. In here with one command to three, we can instantly see that none of our nodes are attached to any networks yet. Let's verify that there are no networks attached to Dell 2 on the switch, and we can see that there's nothing attached. The next thing we're going to do is create a trunk port. Let's see what networks are available. We're going to use these three test networks. We'll start by creating a trunk port with test one network as a native network and test two network as a tag network. This ESI command would take five separate base OpenStack CLI commands to run. We've created the truck port and we can see some summary information. This summary information would have taken four base OpenStack CLI commands. Now let's attach this trunk port to Dell 2. I'm going to specify the trunk port, the MAC address, and the node. This command would take seven to ten OpenStack CLI commands for the equivalent, depending on what state the node is in. The network attachment is complete. Let's take a look at our summary node network list. Now we can see the attached network on Dell 2. We can verify this on the switch as well. VLAN 611 and 612 should be accessible, and we see that that is indeed the case. Next, let's add another tag network to the trunk. This would take four base OpenStack CLI operations. Let's verify this update with our ESI CLI commands. We can see that test three network has been added as a tag network right here. And our summary node network list also shows that same information. We can verify this on the switch as well, making sure that VLAN 613 is accessible. And that is indeed the case. Okay, now let's clean things up. We'll start by detaching the network from the node. This would take three base OpenStack CLI commands to execute. Let's verify that this update happened on the switch, and we can see that the switch port is now shut down. Finally, let's delete this trunk port. This command would take five base OpenStack CLI commands for the equivalent. That concludes this demo. What's next? We're working towards a trial implementation of our hardware leasing system in the MOC with just one key feature missing without talk about in a second. There are additional scenarios where we might use ESI. A Red Hat Colleague working on OpenShift thinks that ESI would be perfect for creating OpenShift baremail clusters that can expand and shrink on demand. There's also FlockX, a PhD project at BU. FlockX is a hardware marketplace where users can buy and sell use of baremail nodes. There's a talk about FlockX coming up soon, and I'll have details about that in the next slide. We also have some feature development to do. We're starting to look at node attestation with KeyLine. The MOC also needs ISCSI support in OpenStack for its hardware leasing system, and there's an upstream patch that we'll be testing out. Then there's improvements to the leasing service, reporting, reservation limits so one SC can't lease all the nodes until the end of time, and a UI. Further information. Red Hat Research Days 2020 is coming up, and there will be a talk about FlockX titled Using Elastic Secure Infrastructure. Why and how? It'll be on September 30th at 12.30 p.m. Eastern, so check it out if you're curious. The Red Hat Research Quarterly has an article about a multi-tenancy ironic implementation called Isn't Multi-Tendency Ironic? So if you'd like more details about what we did there, read all about it. The ESI Git repository is a good place to look at our code and also our documentation. And finally, you can reach us at any time through IRC on the free-node server in the MOC channel. Thank you for listening to this presentation. I hope you enjoyed it. All right. Thanks. It was a great presentation, and let me stop sharing. And we're up to a Q&A session. There were some comments in the chat window, but I don't see any defined questions. Who said what? It was Manuel, Michael. Folks, if you have any questions, please type them in the chat window. We'll give it probably another couple of minutes, and if we don't have any questions, we will... We have some comments. All right. I think we... Okay. There's a question. So the question is, what was the one feature missing for deploying at MOC? So the MOC has a requirement to use... to work with Cinder volumes through the SEPI SCOSY driver, which currently is not present in Cinder. There's an upstream patch for it that has not yet been merged, but we're going to be testing it next week, actually, to see if it works. And if it works, then we're just going to carry that patch with us and use that, and then we'll be pretty much feature-complete for the trial implementation at MOC. Do we have any more questions from anyone on this presentation? I guess that concludes our topic at this point. And thank you very much for your presentation. Thanks guys.