 Good afternoon, everyone. My name is David Lapsley. I'm the engineering lead at MetaCloud. MetaCloud is actually part of Cisco cloud services. And today what I'd like to do is talk about some of the work that we've been doing recently to enhance platform visibility in Horizon. Before we start, I have a couple of quick questions. Can I have a show of hands? Just how many people here are actually OpenStack developers? Okay, how many people here are actually working with or managing OpenStack clouds in production? Okay, great. Thank you. So at MetaCloud, we provide OpenStack as a service. So what this means is that we actually build, deploy, monitor, and maintain private OpenStack clouds in customer networks. So on their hardware in their data centers We have three main goals. So the first one is to provide a public cloud experience for our customer's users. So whether they're using the dashboard, whether they're using AWS or OpenStack APIs to access the cloud, it should feel as though they're using a public cloud. So they should be able to create and manage their cloud resources. They should be able to see them provisioned instantly. We also want to enable our customer system administrators to have an application focus. So instead of getting bogged down and managing the infrastructure that underlies their cloud, we want them to be able to focus their time and energy on managing their users, managing projects, managing quotas, managing security, flavors, images, all of these things. And ultimately what we want to do is provide private cloud that just works. So the way that we do this, we have an extremely experienced team of operations folks and also engineers. We maintain our own distribution of OpenStack that has a number of features, HA, bug fixes, enhancements on the networking side and also to the dashboard. We upstream as much of this as we, as time allows us to do. Our operations team will actually install our distribution within our customer's data centers. And once it's installed, they'll actually monitor that infrastructure 24 by 7. They'll detect any problems with it. They'll try and troubleshoot those issues if they can and if they can't, they'll actually alert our customers so that they can actually take appropriate action. So we deliver all of this in a SaaS based model. So what it means is our customers subscribe to our service and then once they've done that and once the cloud is set up, they automatically get updates to their infrastructure, automatic software updates, which includes new features, patches to the underlying platform and so on. So we do this in any data center on any hardware provided it meets some minimum requirements. So this is what our full stack looks like. So working from the bottom up. So you have the hardware at the bottom there. So this is servers, storage, networking hardware, which our customers provide. You have the networking topology. You have hosting, which is the servers and the platforms, the servers, the platform, the operating system, kernel and so on that's used to host these open stack services. And then you have high availability service orchestration, which is functionality that we add on top of our cloud. So on top of that, you have the virtualized cloud resources. So you have compute, networking, storage and then also identity. And this is accessible either via the dashboard or via the open stack CLI or via the APIs, either via the open stack APIs or the open stack AWS APIs. So basically if you look at this, from here down is what we manage and from here up is our customer view of their private cloud. So this model delivers tremendous value to our customers and they really like it because it allows them to focus on adding value to their business rather than on managing infrastructure. But one of the challenges is it's a slightly different model for their sys admins. They're normally used to being able to SSH into any box on their network, get pseudo access and then poke around, run different commands to see what's going on. So in this environment, it's a little bit different for them. They don't have that access and so we need to be able to find a way to provide this for them. And so the challenge for us is how do we enable admins to understand what's actually going on in their clouds? So the first step is actually instrumenting the clouds. So there's a number of ways that we do this. This is the way we instrument the cloud for live stat. So at the top of the diagram there, you can see we have redundant controllers. At the bottom you see we've got hypervisors. We can have up to 500 of these in a particular AZ. Each of these nodes is running a special live stats daemon that we've written that will provide information on the current state of the server that it's running on. So things like the CPU, the disk space, network IO and so on. Whenever we need to find out information about what's going on, we actually just send a query to the live stats daemon and then it returns that information. So you can see horizon up there. So horizon is actually where the request will come into to see the state on a particular server and then horizon forwards that request and then returns the information back to the user. We also instrument the cloud for historical metrics. So we have collectee running on each of our nodes there, which also collects a large amount of information about the system. Collectee actually stores all of that data in a centralized graphite database and then we can access that data historically up to a year's worth of data. One of the new things that we've been working on is service health and we have a new feature. I'm just going to show you the prototype of that feature. It's not implemented yet. But this feature actually instruments the instruments the OpenStack APIs. So we can see how many requests per second are coming into the APIs. And we can also see what the response time is for each of the APIs. And this is really valuable information. Even just seeing this information is very valuable. But actually being able to take action thresholding doing dynamic thresholding or doing static thresholding on response times, for example, to give a measure of what the service health is at any particular point in time is something that's really valuable. So the next step is actually enhancing horizon and adding features to horizon that will allow us to expose this to our customers in a way that's easy for them to access and also easy for them to understand. So we've actually so this diagram here shows a number of things. So this is what a horizon stack looks like. So at the top, you have the web browser. Here you have the Django horizon stack and then at the bottom you have the OpenStack services. All of these blocks in white in here are functionality that's delivered by Django. So there's functionality for doing templating so that you can control the view that a user sees. There's functionality for being able to create views so that you can control the information that gets sent back to the user. There's API functionality down here. Sorry. So all of these white blocks here are actually standard functions that come with Django. All of these green blocks are enhancements that have been made by horizon to actually make it easier to create dashboards, panels and views that display OpenStack information. Over on the far side there, you can also see that there's some yellow blocks and these are blocks that we've added to horizon to make it easier for us to be able to access all of this information. So when a web request comes in from a user's browser, it'll actually come into the URL dispatcher. It'll go through the view here, which is responsible for rendering the view down through the API and then to the OpenStack services to retrieve information, which then pops all the way up. You grab the view, we'll get the template, use it to render the response and then send that response back to the web browser. So what we've added is this yellow REST API down the bottom here, which actually extends the current API in horizon so we can use all of the existing API calls. But then we've also added modules that allow us to access graphite and then also live stats. So we can pull all of this information together and then forward it to the front end where we have AngularJS based client side views that allow us to provide more interactive, more easily extensible views to present this data to our customers. So we actually have a demo here that is running on my laptop. So it's a little bit slow. What I want to do now is actually show you what these features look like. So this is actually running on a virtual machine on my laptop. It's running on DevStack. There's a single controller and a single hypervisor. So if we click on the controllers tab here, you can see there's our single controller. If we click on the live stats overview tab there, you can see there's a bunch of information that you see immediately. So you see information about the host. So the uptime, the physical configuration of sockets, cores, threads and CPUs. You can see the CPU information down here. So you've got the load average. You can see the breakdown of the CPU utilization, the current CPU utilization. You can also see down the bottom here, memory utilization. So the total memory, the breakdown of utilization. Over there we have information about high level information about network IO. So you can see all of the interfaces and the receive rates and the transmit rates. And you can see that it just updated. So one of the neat things is this is updating every five seconds. So you're really seeing a current snapshot of the state of the server. And then down the bottom there, you can see disk usage as well. So we can also click on this processes tab here and we'll see the top 50 processes that are currently running on that server. And this is really useful if you want to see which processes are using resources on your server. So we can click on memory over here and that will actually sort by memory usage. It turns out we're not using terribly much memory at the moment. You can click on, sorry, memory utilization, I should say. You can click on CPU utilization over here. And you can see it'll be sorted by CPU utilization. So you can see which processes are using the most CPU. We also have a networking tab. So this actually shows more detail about what's going on on the various networking interfaces. So you can see packet, receive and transmit counts, receive and transmit bytes. You can see errors and so on. And again, what's neat about this is it's updating every five seconds. So you get to see incrementally how things are going as well. And then finally for life stats, we also have more detailed information about the disks. So you can see the different mount points. You can see the total amount of space that's available, how much of that is used and how much is free. Then you can also see disk IO statistics as well. So that gives our customer admins a snapshot of what's going on on any one of the controllers or hypervisors in their cloud. We also provide them with historical metrics. So you can see here this controller, if we click on this little drop down here, you'll actually see historical metrics. And so there are a number of really useful metrics here. So you've got CPU utilization, CPU load, memory utilization. You can see the network usage in kilobytes per second, packets per second. You can also see the disk utilization local disk IO. And there's some nice interactive features there as well. So as you mouse over these, you can actually have a little pop up that shows you what the value is at each data point there. You can also click on the different data series if you want to remove some data series so you can focus on one data series or another. And we can also look at different time scales. So that's what the data looks like over a day. And we can also do a month or a year if we like as well. So the last thing I want to show you is some work we've been doing, as I mentioned, this is not yet released. These other two features have been in production for quite a while now and we've got very encouraging feedback from our customers who really like that. This is the start of a feature that we're implementing to give us a sense of service health. So if you click on, for example, Nova, this shows you the request rates and the response times for all of the calls going through all of the Nova APIs. So you can see this graph here is actually showing you the breakdown of requests per second. It's a pretty low request rate. I just have a single script that's running to generate some traffic. So it's not generating that many requests but you can see it broken down by operation. So get, put, post, and delete. Over here you can see the response times. And even just in this form, it's actually really a useful thing to be able to see. While I was actually working on this stuff, I was actually able to figure out that there were some issues going on in my stack. Just from looking at the response time. So it's, in production, when you're looking across all of the APIs, this is going to be tremendously useful. And then it's also really just the start. Just capturing the metric is just the start. The next thing is going to be using that metric, thresholding against it so that we can come up with measures of quality of service or service health, I should say. So that we can see when services in the network are performing well and when they're not. So we've really only just scratched the surface. Even these two features have made a tremendous difference in terms of how much we enable our customers admins to see and visibility into the platform. But we're still just scratching the surface here. So there's a lot more work that we're going to be doing. We've also unfortunately not had much time to be able to upstream much of this work. But as we continue to grow the team, we're certainly going to expand and spend more time doing that so other folks can share this work. Which brings me to another point I wanted to make. If any of this sounds interesting, we are hiring at the moment. We're looking for platform and UI engineers. So we would love to hear from you. If you're interested in joining the team, we have a great team. We have a lot of fun. We do a lot of really cool stuff. All of our features go into production. So you get to see that, which is really neat. So go to jobs.metacloud.com. You can see all the listings there. Also feel free to send me an email or talk to folks at the booth there as well. We'd love to hear from you. So there are my contact details. If you're interested in talking about this stuff, I'd love to hear from you. Please feel free to reach out. So thank you very much. I was just going to say, do we have any questions from anybody in the audience? No? Okay, great. Thanks again.