 Let's get started. All right. Good morning, everybody. Thanks for coming to our talk. It's probably a little rough. It's the first talk in the morning after the Stack City Party last night. So thank you all for being here. I promise that we'll make it entertaining and enjoyable. So enough monkeying around. Let's get started here. So my name is Will Foster. I'm one of the core operators on Tristack.org, which is an OpenStack Foundation-sponsored project. And I'm Combee's a guy for Will and I are teammates at Red Hat, working on OpenStack DevOps. So what is Tristack? And just to kind of field the audience here, how many people have heard of Tristack before or used it? Awesome. Over half. Sweet. Good stuff. So we're just going to dive a little bit into the history of Tristack, and then we're going to get to the fun stuff, which is some of the improvements that we're going to be making very soon. So Tristack is a free OpenStack sandbox. It's basically a place for you to try your applications without the complexity of having to deploy OpenStack and manage it and all those other details. It was incepted in 2011. And we recently revamped and moved the implementation around the 2015 time frame. It is an OpenStack Foundation project. So it's purely volunteer-based. And outside of myself and ConBees, we also have a few other people that help us with moderating the Facebook group and some of the other elements of the establishment. The hardware and the resources are donated by corporate sponsors. And resources are basically administration time and expertise. So Red Hat, NetApp, Cisco, Dell, Juniper, those are all folks that have been kind enough to grant us hardware that we can provide the service on. So why is Tristack around? Why do we have this service? It's really about advocacy. It's really about providing a very easy way for developers or users or anyone interested in OpenStack and cloud computing to very easily kick the tires and get situated with what OpenStack can offer. As we mentioned, it removes the complexity of having to deploy OpenStack on your laptop or heaven forbid stringing your CIO up and trying to get some money out of him for a cloud computing project. And we really just want to increase the adoption and the awareness of OpenStack. And lastly, we want to showcase a lot of the new features that are in OpenStack. So we strive to upgrade every major release so that we keep things fresh and that you can very quickly find a place that hands-on you can experiment for yourself and see how your application works or just kind of poke around and see how things go. One other thing to note is that we stay as vanilla as possible on Tristack. We don't endorse any sort of company or distribution labeling. We use vanilla RDO to run the environment. And we offer a cloud images of pretty much every distribution out there. OpenSUSA, Ubuntu, Red Hat, Fedora, CINOS, it's up there. And you also have the ability to upload your own image and test that out as well. So in 2015, we had a major revamp of Tristack. It was originally in an Equinix data center on the West Coast. And mysteriously, the entire environment disappeared one day. Just completely went off the map. So all of a sudden, it just wasn't there. So we first thought it was a network issue. We dove into internet weather report. We started pinging some colleagues who were hosting in the facility. And there were no issues there. So after some investigation, after a litany of phone calls and trying to get to the right person, it turns out that the ownership of the Colo was in midst of being transferred from Rackspace to the OpenSack Foundation. And the electricity bill didn't get paid. So it was a pretty interesting situation to be in. But from our perspective, it turned out to be a lot faster for us to basically just redo everything and start from scratch. So we hustled pretty hard. And in about a month and a half, we had a brand new implementation. And we took that opportunity to upgrade to the latest release and also bake in some other automation features and tooling that we'll talk about in a little bit. So let's talk about where Tristack is, what it kind of looks like from the infrastructure level, and how it's comprised today. It's in an East Coast US data center. We currently have a slash 24 public IP address pool for Neutron for floating IP addresses. So this is the pool that users are going to be able to get their public-facing IP address to test their application or to have access to their tenant. We're currently on a half a rack of Dell servers donated by Dell. And we're going to be upgrading all of this soon, which we'll talk about. And we have a NetApp FAS array, a C mode cluster that's used for Manila development. So a lot of the upstream Manila dev work is being done on Tristack. Ben Schwartzlander, who's the PTO for Manila, is one of the folks that helps us kind of shepherd the environment and work behind the scenes there. We're currently on Cisco switches. And I want to point out a very small kind of Novocan people that we have. We have a very small, phonetic resources, but we serve us a very large audience of users and people that are interested. And as you go through this talk, the overwhelming theme that we're going to bring up is that we try very hard to do a lot with a little. So this is going to result in some custom tooling that we're going to share with you, some ways that we try to ensure adequate service for everyone that wants to use it. But it's always going to be a recurring challenge for us that we simply just don't have enough hardware. There's not enough IP space to service everyone all the time. So we have certain resource culling rules and things that we put in place to ensure everyone gets a fair shake at the environment. All right, so let's get some pictures. So this is the current tri-stack data center. There is absolutely nothing out of the ordinary with this picture. Nothing at all. It is just a normal data center. These are the data center cats. I think they've always been there. They've been there as long as I've been involved with the environment. And I think to a certain level, they're multiplying because the more of them keep showing up. But they're benevolent. They just kind of hang out. They sit there. I think this one fixed Neutron once. So let's talk about how people are using tri-stack. What are some of the use cases that we see on tri-stack? The main focus is going to be development. This is kind of a sandbox for developers. It's a place where you can test your application with absolutely doesn't cost you anything. We don't care about your personal information. We just want to make sure you're a human if you're using the environment. Later on, if we get in contact with extraterrestrial life, we may change the policy a little bit. But it depends on how aggressive they might be. But right now, it's for humans only. And it's for pretty much anyone that's interested in OpenStack. So this realm is going to be around DevStack. We have folks that might spin up a blog to test functionality or a web server, Etherpad, any manner of application use under the sun. We have some folks that run their CI on the environment as well. The RDO distribution runs subset of their continuous integration there. And as we mentioned before, the Manila team does some upstream development on tri-stack. That's not it. We've got some other interesting use cases. And this is one that I'm really excited about. And about the past year, we've been approached by a few universities that have computer science courses or teaching courses on Python. And they've asked us, hey, we love this project. Is there possible for you to set it up so that the students in my class can get used to interacting with APIs, writing applications? Absolutely. So we've had two universities so far that have used the environment, one in Slovenia in Ljubljana. Their computer science department is teaching a class on Python and cloud computing. And they've been using it for quite a bit. And most recently, in Cork, Ireland, the Institute of Technology there is teaching a Python class. And they're getting their students open to CLI stuff into interacting with APIs. And OpenStack provides a very robust set of APIs. So it's a great learning tool for people in the collegiate and university level. And I want to make the point, too, that exposing people that are in universities, that are in colleges, and studying computer science and technology. It's extremely important to get them open to open source tools, to have awareness to how the open source model works to start to collaborate early on with other people. So when they actually go out into the industry, they're already prepared. They already have a leg up. They're not myopic in what they study. So we're excited about this use case, and we welcome more universities to reach out to us so we can accommodate them. A little more about Manila. These are just some milestones from Liberty and Mataka. I'm not going to really go into detail there. If you guys are interested, you can find us on PoundTriStack and Pingbin Schwartzlander. So let's look at usage and metrics. How many people are actually using the environment? TriStack was incepted in 2011. So this 24,000 number is going to be all of the accounts that have been created since 2011 that the authentication method was the Facebook API. We reset the environment every major open stack release. So this about 5,600 number is the current number of active users on Kilo. Next week, when we move to Liberty, that's going to be reset, but accounts are auto-created. So if you log in again, you don't need to create a new account. Keystone will just automatically create your account. And there's some glue that's been written between Horizon, Django, and Facebook, which makes this interoperability work behind the scenes. Daily instances, we see about 300 new instances being spun up every day. It's about 7,500 per month. And we have about 800 to 1,000 active neutron networks. And we're going to drill in a little bit into some of the challenges we've had specifically around neutron and specifically around resource containment, that we weren't going back and culling neutron networks. We discovered some interesting bugs by having a couple of thousand DHCP ports and a couple of thousand networks that kind of sat there. But we've also been able to discover some scalability issues with OpenStack that we've addressed. So we saw the metrics and the numbers in numerical form. This is a Grafana dashboard. This is one of the ops tools that we use, which illustrates the same exact thing, except it's in a visual form. We're a pretty big fan of Grafana and Graphite. It allows us to visualize a lot of the usage data. We can see trends. We can see performance issues. So throughout the talk, you're going to see a lot of references to Grafana. And you're going to see some specific examples and dashboards that we use to manage the environment. So some of the tooling that we use to manage the environment to make our lives a little easier. Like I said, we have a very small team and we service a large user base. So first and foremost, we want to have config management. So we use Puppet and Ansible for that, kind of a mix of both of the persuasion that not one configuration management tool solves every sort of problem. So we kind of pick and choose the best from both worlds to accomplish the goal. Now our direction here may change depending on what we want to do. But right now, we kind of pick and choose what makes sense for us. Graphite, Grafana, and CollectD, we mentioned those before, that's going to serve as kind of visualizing our data, as visualizing the usage patterns, showing problems, showing trending, and things like that. And plus managers love graphs. They just love colors and things like that. The Elk stack, so we have visualizing data and we have log aggregation. We don't want to ever be in the business of SSH into servers and individually look at log files. OpenStack is very verbose. There's a lot of information there. Some of it's useful. Some of it isn't. So we try to do a lot with the Elastic Search log stash in Kibana stack to visualize things and put it in one location so we can look at. The good thing about the Elk stack is that it's also interchangeable. So there are a lot of folks like FluentD instead of LogStash. We just went with LogStash because we've got more experience with it. The last piece is monitoring. And there's a lot of monitoring tools out there. There's a lot of really good open source ones. Zabix is popular. Sinsu is up and coming. We stuck with Nagios. It's something that we've used for decades. And there's a big community around it. And we do a lot of custom plugins and checks that don't fit in the gamut of what would ship out of the box with OpenStack. So I'm going to dive into an example of that. If you see here, this is our 1990s-esque, pearl-based Nagios monitoring page. It's not going to win any website design awards, but it does get the job done. What you want to glean from this is if you look at, it might be hard to see from where you're sitting, check number nine, which is a floating IP check. So this is going to illustrate the flexibility that we have with writing our own checks. And that you might have services that are running in OpenStack. And they might show up green from a generic check perspective, but they actually in the back end don't work. So for example, the floating IP check will call Nova to spin up an instance. Then it will contact Neutron. And it will sign a floating IP address. It will ping that instance and get an ICMP response back. Then it will SSH into the instance and run a command, arbitrary command, collect the result of that command, and record it, and then tear everything down. And this runs every 15 minutes. And the reason for this is that if any one of those steps fail throughout that process, we get an alert. Those alerts go to IRC. Those alerts go to email and optionally SMS. If you can ping an instance, sometimes you might not be able to SSH to it. Sometimes your metadata service might be down. There might be a slew of other things that might not be working on the back end. And unless you have that full gamut of coverage, you're going to think it's working, which your users are going to have issues. So house cleaning. So we talked a little bit about the sheer demand for Tristak and how small the environment is as far as resources. So we've had to be very creative with some tooling that we've authored and some post-cleanup type things that we've put together to ensure that people have a fair shake at the environment, and that they get their turn to use the service. So this may change due to the amount of resources available. Right now, the floating IP addresses are purged every 12 hours, and nothing stops you from just simply re-spending your instance back up, because we provide full API access to the environment. Network gateways are cleared every day. Center volumes are purged every 48 hours, and instances are deleted every 24 hours. And again, this is a Grafana graph showing the rise of resource allocation and then showing the drop-off when our tools kick in and we reclaim resources from the environment. And we've gotten very close to red line on a lot of these resources. We've gotten over-allocated several times. We've exhausted the floating IP pool several times. So we've had to go back and rethink the retention of how long we let people use the service. Now, when we talk about the upgrades that we have in store that we're going to hopefully roll out next week, we're doubling a lot of this footprint. We're doubling the IP space. We're doubling the amount of compute nodes. And so we expect to actually loosen up a little bit on these constraints. But for right now, they're there to make sure everyone gets a fair shake in the environment. Further automation. So we're constantly in the realm of trying to get as close to upstream open-stack infrastructure as possible. There's been a lot of good work from the Upstream Info Team as far as automation and gating and CI and just in general orchestration that happens. So we're slowly moving some of our workflow to be closer aligned to what Upstream infrastructure is doing. The most recent example is going to be the website content. So if you went to tristack.org, that actual CSS content is in Upstream open-stack infrastructure. It goes to the same gating process. It goes to the same CI and the same review process that you would go through if you wanted to submit a patch upstream for review. Some further automation is that we've tied in service alerting to IRC. We use IRC internally at Red Hat and also very much upstream and also on FreeNode. So there's usually an IRC channel for every major project. Tristack is not any different. And we have bots that will alert us if something happens. If any of those gamut of checks fails, they let us know. And again, we've kind of gone through a lot of this, but the major challenges that we see, number one, is going to be resource allocation and trying to juggle a large demand against a small amount of finite resources. Security is also something that's interesting. You're going to always have security challenges when you're on any sort of public service. We've seen a few interesting use cases. I hesitate to call them use cases within Tristack. For instance, one gentleman wanted to torrent Justin Bieber albums. Now, I believe her that anyone should be able to listen to whatever music they want to. Just don't use a free public service if you're going to do it, because we will ban your account. And we've got some tooling that we'll go through and kind of figure this out for us if someone's abusing the service. Ration of resources, again, ties back to availability. And then record keeping and auditing. So we found a lot of gaps from when OpenStack is deployed until you as an operator are running a public cloud. There's things that just don't ship out of the box. And for a lot of this, you can't ship it because everybody's environment is different. It's very hard to ascertain how someone's going to use something once it's deployed. And OpenStack's strength is that it's very modular. So you may only use a subset of some of the feature sets there. You may already have a virtualization fleet that you want to plug into Nova. You may already have a storage back in. So it's very hard to figure out what someone needs after they install it. So we're going to go into detail about some of the tooling that's useful to us. And like everything with Tristack, we make everything public. Everything is open source. So we'll be sharing that with you here after the talk. And on that record keeping and audit point that you're making, Will, one of the things that we had to do when tracking usage is we didn't want to pick a time of the day when we reset the environment. You talked about the reclamation of resources and the resetting of the VMs. It's a global resource. There's people from all over the world. So midnight for us might be 8 AM from somebody else. So we actually wrote some custom tooling that tracked when a user allocated their floating IP or when they set their gateway on their router because these aren't fields in the default database for OpenStack. You can't say, OK, when was this gateway set? You can tell when instances were launched. And we actually take advantage of that, which can't tell when a floating IP is allocated to a project or when a router has its gateway set. So we wrote some custom scripts and they're all on our public repos on GitHub, which tracks all of that in a separate database. And so when the clock starts ticking and when we're looking at, OK, which VMs are we going to delete now? Which floating IPs are we going to remove from projects? We look in that database and we look at the age of allocation for those resources and we remove those resources. The other thing that we had to write custom tooling for is particularly in TriStack, these interesting use cases where, for example, somebody brings up Torrent for the Justin Bieber album. I mean, we don't want that. But what if they go away? We called their instance. And now somebody else who's a legitimate TriStack users doing Python development, they were unlucky enough to get the floating IP. We had a cease and desist order from the Colo saying, hey, you've got to shut this down. We have tracking information that's not built into default OpenStack that we layered on top of TriStack, all of which you can also make use of. They're all in our public repos. So we can go back and say, last week at 3 PM, who owned this particular IP address? And right now, since they're members of the Facebook group, that's how they get in. We can say, OK, we've got to shut that user down. We go in and yank their user. So you've got to play nice if you want to play in TriStack. You've got to make sure that you're not violating any rules, things of that nature. The other thing that we looked at was there's an OpenStack management script called OS Purge, which can help you to clean out OpenStack tenants and all of the resources that they're consuming. So over time, if you're using a cloud, particularly one that's available for free, you need to be able to purge those things out. And the thing that we found is that, generally speaking, people are greedy. So if you give them something for free and you say you have the choice of using an M1 large or an M1 small, well, duh, they're going to take the M1 large because you've given them the choice. And they're going to take as much as they can. So we definitely needed to be very aggressive in cleaning up our resources. And OS Purge is one of the things that we take advantage of. But we didn't want to just run OS Purge against all tenants. So we kind of tied the two together, where we have our own custom scripts that look for the age of VMs when the VM ages out and they're deleted. That user may have come in and started using TriStack. And they basically got a feel for it. They got what they wanted out of it. They're not going to bother deleting their own resources. Our culling scripts will delete their VMs. And then we look at tenants and say, this is a tenant that doesn't have any VMs. Therefore, we're going to run OS Purge on them. And that's where we get the cleanup of the various other resources, like the neutron resources. Because what we found initially when we set up our scripts was we were doing all this cleanup work. But we were kind of leaving the tenant networks in place. So over time, as people came in, tried out TriStack, suddenly we've got 1,000 neutron networks or even more. And they all have DHCP agents. And they're all using up resources on the neutron networker node. And it gets slower and slower and slower to spin up tenant networks for new users that want to come and use TriStack. And it became kind of a bad user experience for them. So by tying OS Purge together with our own custom scripts, we're able to kind of manage that environment. And actually, one of the previous slides that showed the Grafana graph was actually before we started incorporating OS Purge. So now I don't think we're going to see that upward trend anymore of the neutron networker load and the number of networks. I think the system is getting to the point where we're able to kind of track resources and get rid of things that don't need to be there anymore. So it's an evolving thing too. I mean, we're constantly finding areas that, oh, we don't have monitoring for this use case. Or we need to be more creative about how we call a resource. Or we need to write a tool to do this to kind of bridge this gap. And some of it may be TriStack-specific. Some of it may be specific just to public cloud usage. But it's a constant learning experience for us too, as well. So we're expanding the environment. One of the things that I think Will also mentioned earlier was that we want to stay on top of the current upstream releases. Unfortunately, we were not able to push Liberty out when we wanted to. We had the environment ready, but we didn't have the colo space where we're moving quite ready yet. So Liberty kind of fell behind. It's actually there. It's up and running. We're going to make use of it in another hands-on workshop session tomorrow morning at 9 on microservices. I don't know if that was on the previous slide. Or in any case, is that the slide we're on? Yes. So we've got new gear coming in. We've got Liberty that's basically ready to go out. As soon as we have Liberty out the door, we're going to build a Metaka installation. And we'll probably be hot on the heels of Liberty. And that's going to also give us additional resources, because we've added more nodes so we can have a staging environment. We'll have probably very little downtime when we flip the switch between releases at this point, because everything's kind of settled down. The dust is settled in the colo. We're going to add OpenStack ID authentication and start to phase out Facebook. We've been wanting to do that for some time, but it's not actually a trivial task, but that's in the near future for us. And additionally, we've got more IP space. We've doubled the IP space. Instead of a slash 24, we have a slash 23. So we may be able to kind of relax the time in which the floating IPs and gateways are set. We've got additional Dell FX2 FC430 nodes. So we've got significantly more computing power at this point. So we should be able to accommodate more usage for longer durations of time. We've already talked about these points. Yeah, that was the microservices hands-on workshop. It's tomorrow at 9. And so we're going to use the new TriStack Liberty gear with a live demo. There's non-Facebook accounts that have been provisioned. So people that are in that workshop can hop in and make use of our TriStack platform for the microservices demo that's going on. One of the other tools I wanted to give a shout out to is BrowBeat. BrowBeat is a collection of tools that helps you to do validation against an OpenStack distribution. The main author is actually in the audience here, Joe Talerico. One of the things that it lets us do is to do validation against common bugs against a vanilla deployment. So if you run BrowBeat against your OpenStack, it's going to come back. It's going to log into all of your Nova Compute nodes, your Neutron node, your controller. And it's going to check against common bug zillas that have been filed. So if you need to make any sort of adjustments on your settings, it's going to tell you what you need to do. It's not going to change them for you, but it'll give you all of the references, for example, the bug zilla links and things like that, that you need to adjust. So we've made use of that in TriStack. We've also made use of some of the Raleigh workloads. It's also included as part of BrowBeat. So you get a sense for what you can do with Raleigh. I don't know if you guys are familiar with the use of Raleigh against an OpenStack, but just out of the box, it's got some interesting features. It can visualize how the workload performs against your cloud. This is an example of Raleigh output that we ran against the TriStack environment, the Liberty installation. We basically launched 60 concurrent VMs and associated floating IPs with those VMs logged in over SSH. All of this is automated through Raleigh. It's not custom code we've written, but we're utilizing. And on the next slide, I'm going to show you what Grafana visualized for us during the time that we were burning in the environment. So in this case, you see a very cyclical sawtooth pattern because essentially we were the only users of the new TriStack Liberty platform, but we were kind of sequentially hitting it with the Raleigh workloads. That kind of gave us a sense that everything is working exactly as we expect. We didn't see any failures in those Raleigh workloads, but the Grafana dashboards that we used are also, the Grafana service is easily deployed using Browby and there's Ansible playbooks that you can use to deploy that. If you've got infrastructure that you want to throw Grafana and collect the graphite on, you can do that. And the links at the end of the slides give you where the GitHub locations are for all the tools that we've mentioned in this talk. So you can go out there and take a look if you think of improvements or suggestions. You know, we're happy to talk to you guys. And these are some of the folks that are involved and have been involved in TriStack. Do we have any of the old founders alumni people? Hey, Nachi, how's it going? Absolutely, thanks for creating it. So this is the current team, Kambiza myself, Ben Schwartzlander from NetApp and we have a new person, David Manchato who is also helping us out there. And we're also striving to kind of build more of a volunteer framework as well that we want to encourage more people to help us out either with operations expertise or even just to help us moderate some of the public discussions and things. The Facebook group is still very active of about 24,000 people. So every day there's questions being asked and answered there. And I'm very appreciative for folks that have been helping out in that forum. And you can find us on poundtristack on irc.freenote.net. And here's a link to a lot of the, well actually all of the code that we just talked about. It's gonna be, you know, there's a TriStack repo here and there's also a link to Browbeet and to OS Purge which is powers a lot of the cleaning and culling that we do. So we're gonna open it up for questions. We don't have that much time but we're happy to spend as long as anyone needs to talk to you after the talk as well. If you do have questions I would ask that you move to one of these two microphones here and you can ask us anything you like. So, well that's a first. I have a question. So, Rhode Island is neither a road nor an island. So does anyone know why they called it Rhode Island? I'm not gonna quit my day job, don't worry. Well thank you guys for coming to our talk. We really appreciate it. And we might have a question. Oh. Yeah, good. Hey, and Sullivan AT&T, if I wanted to set up my own TriStack environment in a lab to be able to duplicate what you've done for users internally, is that possible using the code out on GitHub to duplicate your type of environment in a lab? And are there any ties into Active Directory for Windows authentication for users? So when we did the deployment, we actually took the simple route of using Packstack to deploy and that's in part because of the limited resources that we have. So you can certainly deploy your cloud however you want to and the scripts that we have are primarily around the management of the resources and absolutely the only thing you would need to change in the scripts for like deleting of VMs and if you wanted to have a schedule for when router gateways are cleared and floating IPs is update your KeystoneRC admin file. So the scripts that we have source a KeystoneRC admin and as long as that's updated and it's external to the scripts then yeah, they will definitely work against your cloud distribution as well. Even if you're using HA or whatever, it doesn't really matter. It's all very well documented to the things that you need to change if you're gonna consume it for your own use. So we usually tried it to zero everything out that's specific to our environment so that you know exactly what values to plug in to use it on your own. For the Active Directory question, I have not done any testing with Active Directory and Keystone integration. We might have a Keystone or Horizon developer in the audience, maybe they could answer that but if not, I could certainly find the answer for you. Or maybe Graham knows, Graham knows everything. Is that auto provision, the Keystone, tenant and users as well? Thank you very much and love the costume. Thank you. This is my normal clothes, I wear this. My wardrobe is nothing but these suits by the way, just like Batman, just not as cool. Yeah, that's kind of been the challenge. We've looked seriously at the OpenStack ID and OAuth2 authentication as a different way of creating users and having them use TriStack. The big problem is the auto provisioning of Keystone, tenants and users that currently works with the Facebook code is not something that works right out of the box so we would have to figure out a way of pulling in data from OpenStack foundations, OpenStack ID and saying, okay, well these users are members of the TriStack group within OpenStack.org. Therefore, they're allowed to use TriStack. Therefore, if they don't have a Keystone tenant defined, create it. So there's some work there that needs to be done before we can make the switch. And to that point too with OpenStack ID, we want to add an additional level of verification that the person is indeed a human so that there is some sort of intervention that the accounts are approved by another person so that there's no automated system that can get a token and then log in and then start to use a system. But we strive to be very vigilant about approving things within 12 hours of when the request comes in. Even if they self-identify as alien. Or monkey, yeah. We'll get there when they show up. So any other questions? Okay. Well, thanks again, folks. I appreciate it coming out. Thanks a lot, guys.