 Hello, thank you for joining us. My name is Travis Newhouse. I'm the chief architect at UpFormix. And I'm joined by my colleague, Harshit Jitalia, who leads the integration of our product with OpenStack and Docker. And UpFormix provides software to control and monitor the resource usage in your infrastructure. And today, what I'm going to talk about is after you've set up your infrastructure and deployed your applications, how as an application owner do you ensure the expected performance of your applications? And as an infrastructure operator, how are you able to maximize the value of your infrastructure? So after you have OpenStack or Docker Engine up and running, you have this ability to easily create with a few mouse clicks, a few keystrokes, an instance, and start an application inside your infrastructure. And this is awesome because it's very simple. You can just spawn things off and you're off and running. And as soon as people find out about this in your organization, they're very happy. They come to you and they say, I need an instance. And so you spawn them one. You give them a virtual machine or a container, and they deploy their application to it. And everyone's very happy to start running applications. And where are these applications running? The applications are running on shared infrastructure. So every time you spawn a new instance, the slice of the pie is shrinking for each of those applications. Now, in some cases, that's OK. And you actually want that. You want to maximize how much utilization you get out of that hardware you've invested in. But there are other cases where it can eventually lead to applications not receiving the amount of resources they expect to achieve the performance they desire. And in the simplest case, what can happen is that the application, there's just too many running on the same host. And so the amount of resources that are available to a particular application are just too small. And the user experience degrades because there aren't enough resources to do what the application wants. The application developer has tested his code. He's implemented it in a scalable fashion. He's made sure that he's done his load testing. But when he gets it into production and it runs on shared infrastructure, all bets are off because he doesn't know how much resources he's going to get to run his application. The second kind of problem that can arise is actually what I would call dynamic demand for resources. On that shared infrastructure, it's not always a constant demand for resources by the applications. Workload comes and go, requests from users come and go. And the amount of resources that a particular application use is going to change over time. And as it changes over time, the availability of resources changes over time to all the virtual machines or the containers running on a particular host. And what that leads to is this unpredictable performance. So in contrast with just not being able to get an absolute amount that you need to run acceptably, you can run into a different problem where the unpredictability is what you'll get complaints about. Sometimes the application is running great. Sometimes now the user experience is poor. And so how do you deal with that unpredictability? I'm going to give a little example here. What you can see on this user interface is that this is showing the view of a single host that is hosting six virtual machines. And we're displaying the amount of CPU, the amount of memory, and the network I.O. in and out of the physical server associated with each virtual machine on that box. Now, when the application's not getting the right performance, when the application owner gets a complaint that the performance is not meeting the expectations, he's going to start asking, why is it happening? And the infrastructure operator will typically look at the connectivity. He'll check that the packet and byte counters are going up. He'll see that the application is getting scheduled. He'll see that it's running. And he'll say, from my end, it looks fine. The infrastructure is doing what it's supposed to do, your application is running. But the application owner knows that he's not getting the user experience that he expects. And we need these monitoring tools to help identify where is the bottleneck happening. And in this case, with the right tools, what we can see is that, as the number of instances grew, maybe this application in yellow at the bottom, it keeps shrinking. It gets down to the point where it's either below an acceptable threshold of resources. Or as the demand varies, because the blue application has a varying load, that sometimes the application in yellow is warning, well, and sometimes it's not. And so this leads to the first thing that what we at Performax provide, which is real-time monitoring. And it has a few aspects. First is just to give visibility into what the problem is in real-time. So we want to give you fine-grained resources, or fine-grained view of your resources over small-time scales so that you can see bursts. We also want to give you visibility at all the resources that an application might be using. So the CPU, the memory, the disk, and the network. And that's important, because then using that information, you can correlate problems. Like, if you see spikes in CPU at the same time, you see a spike in network. Maybe that lets you identify what kind of problem your application is experiencing. Maybe the latency is going up as the CPU spikes because of the demand of CPU by a different application running on the same host. And then the third thing that we require in the real-time monitoring is that the ability to see how the resources are used across the different layers of your infrastructure. So you want to know how much resources are being used at the host, the physical host. You also want to know how much resources are being used. When I say the physical, I mean how much total CPU, total network, so that you can kind of see how much capacity there is. You also want to know how much a particular VM or container is using. And that lets you know if this application is getting what it needs to achieve the expectations, the requirements you have for its performance. And then finally, the additional thing we want to provide is application-level metrics. And an example of that is for, if we look at HTTP, so many applications these days are built with a service architecture. And you're using REST APIs to access the application. And so it's very interesting to see what is the request rate, how many gets versus puts versus posts are being made to an application. That request rate is something that's interesting to the application developer to kind of profile their application and know why it might be performing one way or another. The second thing is that the application owner may be curious to know if he doesn't have his own counters in built into the application, that what are the endpoints that those requests are being made to? What are the URLs that those REST calls are being the gets and the puts and the posts? So at FormX, the agent that we put for monitoring can actually provide that level of insight into the application protocol and give you the ability to measure where inside the application-level protocol what's going on. Another interesting metric is the time to first byte. So for a web application, for a web server, you might want to know how long does it take from the time the request arrives at the physical host, makes its way up through the virtualization layers to the application. The application processes the request and sends the first byte of response out. Now that's a very useful metric because there's so many things happening between the physical network all the way up to the application. You have to worry about the scheduler. You have to worry about the packets getting through to the application, the data. You have to wait for the application to actually do its work. And then you need to send the response back out. And all those different pieces can become a bottleneck, and it's important to be able to identify. So having that metric of time to first byte lets you correlate things like compare the RTT, the round trip time, with the time to first byte, and identify is the bottleneck happening in the network because the round trip time to the network is slow, or is the bottleneck happening in the application because the network's quick and the data's arriving to the application, but it's taking a long time for the application to respond. And what I've been talking about is sort of contention that happens at a single physical host. If you have, in the example I gave earlier, six virtual machines or six containers or hundreds of containers running on a single host, they're all fighting for the CPU on that host. They're all fighting for the network bandwidth on the physical interfaces of that host. But as we move towards these architectures that are based around services inside our data centers, where we have shared storage, where we have a shared database, or a shared identity service, the contention is now moving outside of the physical host, or it's actually in addition to the contention that can happen at a physical host, you can now have contention at these infrastructure services. So at the storage service, you may have so many requests coming from many hosts, many physical hosts, I mean, virtual instances running on many physical hosts, all asking the storage for data or sending their rights to that storage service. And the bottleneck can now become at a single point in your data center that that storage service is providing or the database service. And that's why, as I mentioned earlier, the ability to see across all the hosts, all the VMs is an important aspect to monitoring. So I'm gonna show a slightly different example here. What we're seeing here is a project-oriented view. So earlier I showed a single host, right? And it's kind of an easy to conceptualize the idea of many virtual machines running on a single host. And what we have here is a more logical view. So in this case, there's a tenant of an OpenStack cluster, and he's got a project with a couple of different machines running on it. And these instances, maybe instance one, two, and three are all running on a single host. Maybe that's where they got scheduled. And the client one is running on a different host. But they're all accessing the same storage server. So what can happen is now, if client one suddenly has a burst in demand, maybe it starts running some really large workload, starts totally reading lots and lots of data from the storage server. Suddenly, instance one, two, and three are getting a much smaller share of the storage IO bandwidth. Now, if you were just looking at the host that showed you the view of instance one, two, and three, you're looking at a single host, all you would see is that suddenly, instance one, instance two, instance three have reduced storage IO. But you wouldn't be able to figure out why. Did the demand by the application go down? Or was it not able to get it? The amount of resources. But when you take a view that expands across more than one host, where you can see what's going on in your data center, maybe you can see what's going on at your storage server itself. Then you can see that, okay, when client one suddenly spiked up, that there's some correlation there. Probably that was what's driving the problem that I have with instance one, two, and three, not receiving the amount of bandwidth I wanted them to get from the storage server. So with proper monitoring, you can help pinpoint where's the bottleneck, like when there's a performance problem, where is it happening, why is it happening. You need these kinds of monitoring tools in order to answer those kinds of questions that arise in your applications and in your infrastructure. But how do you solve the problem? So if the user comes to you and says, I'm not getting enough storage IO, my application expects a minimum requirement in order to perform at an acceptable level. What you would like to be able to do is apply a policy that says this application is assigned this many resources so it can get its job done at an acceptable level of performance. And that is the second thing that FormX provides is that we provide programmable control. We have an API-driven system that you can specify the amount of resources to assign to an application or to a virtual machine or to a container. And the reason we break it down that way is that sometimes you have more than one application running inside a virtual machine. And maybe you wanna prioritize, even within a certain virtual machine, what resources that virtual machine gets and how they get split up among its applications. Maybe the virtual machine is running both a web server and a database. And maybe the database you think is a higher priority to get its data out to a storage server or an object store where it's putting its information. And so you can kind of prioritize the bandwidth for the database, but maybe you leave the client-facing web server that's not a super high priority application, you give it a little bit less. And the API that we provide allows you to do two things. You can both configure it in real time. So if you find a problem, you can go solve it. With a simple command, with a simple REST API you can go and dynamically change the amount of resources that are given to a particular virtual machine instance, application, what have you that's running inside your infrastructure. And you can also, you can integrate these APIs into your orchestration so that when you're creating a new virtual machine, when you're spawning up a new container, you can assign at that time the amount of resources that are required for that application to run. You can configure that policy and we integrate with OpenStack. So for instance, right now you have like flavors of VMs that you can have, right, small, medium, large. What we allow you to do is you can say, okay, for small VMs, I want this much network IO. Or for large VMs, I want such and such network IO and such and such CPU. And providing that control allows you to preemptively assign the amount of resources that you think an application's gonna need, but you still have the ability at run time to say, okay, well, I'm not seeing problem over here, I wanna reconfigure my policy, I wanna change it, I wanna fix the problem as it's happening. And given that application IO is such an important part of applications these days, and application IO is predominantly over the network now. We're seeing that much less frequently is local disk being used. I mean, I'd be a scratch space, but for data that matters, it's going out onto the network typically. And or applications are interacting with multiple services over, you talk to your MongoDB, you talk to your Redis server and your RabbitMQ message bus. All this information is happening over the network. That's how applications and services are being composed these days. So today what I wanna focus on in terms of resource control is how you can configure the network resources to prioritize applications. So just to give you an example of how simple our APIs look is that at runtime with a curl command, I can actually configure the network resource limit for an instance. So that example I gave where we had four virtual machines and they were all accessing shared storage server. We saw that the client one in purple started consuming a lot of storage IO. And so we wanted to cut that down. And so at runtime we could say, let's say we wanna cut that down to 200 megabits per second. This is just a simple curl command you would be able to issue to our controller that would say for that virtual interface, this is how much bandwidth they can use. Similarly, if we were talking about if you wanna do it during orchestration with the time you're bringing up a container, then this is an example of a YAML file that we've kind of extended with Docker compose so that when you specify the container's attributes, you can say okay for this network limit, I wanna set a network limit for that container. So this video's kind of continuing the example we had earlier. And we can see that the curl command was issued, the bandwidth that the purple client one is able to achieve to the storage server is now being limited to 200 megabits per second. And what's interesting to note here is that the total throughput in the system is not changing, right? What we're doing is we're reallocating what resources that our infrastructure does provide to meet and the requirements of our applications and to prioritize which applications are important. So the aggregate throughput is the same but we've just reassigned it. A second type of control that we provide is that we allow what we call a reservation. And this carves out a certain amount of resources for the virtual machine or a virtual interface on a container and makes it unavailable to the other instances running on that host. If you recall the example we had in the beginning where there was varying load and the performance was unpredictable because you didn't know what the demand was, the demand was always changing on that same host. So at some times a machine was able to get a lot of bandwidth, sometimes an instance was not able to get enough bandwidth. You can solve that issue by, if you have an application that's a high priority, you can give it a reservation and you can say, okay, I want to carve out this much for this application and make sure that it receives a minimum amount of network IL. And again, API-driven and we can do it in real time. So here the yellow application is running and right now what we've done is we've set a 500 megabit reservation for the yellow instance six. And what we'll see is that as the load in blue comes and goes that as long as there's enough demand on that yellow application, it's gonna get the network IO that it needs to meet its performance requirements. So now the blue is varying, the purple and pink is they're going up and down because they're kind of subject to what's available. They haven't been given any limit or any guarantee, but yellow has been given a reservation and so it has guaranteed access to those resources on that host. So that's how we can prioritize that application and make sure it has predictable performance. So I just kind of want to bring it back here is that at performance, what our goal is is to optimize cloud operations by using a software tool that provides real-time monitoring and control, right? So the monitoring, I kind of showed a few aspects of what we can do with monitoring. We can look at the amount of resources that a physical host is using. We can look at the amount of resources that a virtual machine is using and we can inspect into what an application is doing in terms of the protocol. We have drivers right now for HTTP. We have one for iSCSI that we're working on so that those kind of cover a large majority of use cases that we're seeing in customer environments, storage being a big thing that's out on the network and web services being the other. So that covers like 95% of things that people are doing that they want to monitor at the application level. With that monitoring, we can show and help application owners and infrastructure operators identify where bottlenecks are in their infrastructure so they can find out is it the application that's performing poorly or is it the network that's performing poorly, right? Because you have this holistic view across your whole infrastructure. You can kind of correlate and pinpoint. And we feed that data up so that the UI that I was showing you earlier is our user interface and that is completely REST based in terms of how it controls and interacts with our software and it publishes the events up to the user interface and we make that available to the users as well. So if there's custom things you would like to do with the data, if there's certain monitoring you want to do, then that data can be consumed in-house by custom applications that you write to solve the kinds of problems you might be facing or to gain insight into what's going on in your infrastructure. And we do a nice thing with that data because there is so much data out there as you scale it up. You can scale, I mean people are, data centers are growing and growing and growing. And our architecture is based on a distributed architecture such that the data that's feeding up is only the data that you really are interested in finding out about, right? So there's so much data that if you were to just push it all up and process it after it arrives at some central location, then there's just, you're gonna have so much data that you won't be able to keep up with it. You'll have to do a kind of offline processing and we want to be real time. We wanna be able to show you what's going on right now in your infrastructure, right now what problems you're facing. And so we actually flip it on its head instead of processing the data after it arrives in some central place, we actually process it on the fly in a distributed fashion. So if you're interested in certain kinds of data that you wanna subscribe to, certain kinds of events, maybe you're interested in, for instance, the time to first byte, you wanna know any time the time to first byte is spiking up. You can push a query down into our system that says when you see the time to first byte exceed one standard deviation above the norm, then I wanna generate an event up. I wanna know that information so that I can make a decision. I can be alerted. I could have a reactive system that is gonna automatically go reconfigure the policy to make sure that that time to first byte maybe is within the requirements of the application. So the second thing we do is control. And as I mentioned, it's all API driven. We give you real time, you can change the policy. We provide for network IO, CPU, memory. What we do is, as I showed for network is that we give you controls to set limits and to set reservations. And the rules can be applied across different layers of the infrastructure. So you can say I want a virtual machine or a container that interface to have a certain absolute bandwidth or minimum limit or reservation. Or we can actually say on an application by application basis. So you can say, okay, this virtual machine is running a few applications, but the web, the traffic that's going on the web, I want it to have this policy. Or for a storage server, when this virtual machine connects to this storage server, I want there to be a minimum reservation so that it can have the right access and ease to the storage IO. So with the monitoring and the control, in conjunction where you're basically able to ensure that you achieve the performance that you want for your applications. So you can both monitor and detect that your applications are getting the performance they want. You can find when they're not getting the performance that you want. And you can configure to ensure that they get the performance that you want. Right now our product is in production with a few early adopters. We're also expanding the program right now to new early adopters. So if you have use cases that you're interested in talking about, the performance problems you're seeing, we would love to hear about them. Come visit us at our booth. We can talk about the scenarios you have, use cases you have. And if you're interested, maybe we can sign you up for the early adopter program. We can deploy the software in your infrastructure. You can see what's going on, getting some insight and visibility into what's happening. We're at booth S4. So bring your scenarios down. And we're a growing company. So if you're interested in this area of work and maybe something you'd like to work on, definitely give us a ring. And I'll open up any questions right now. Oh, can you, I'm sorry. Can you speak at the mic? Dave asked me to. Actually a quick question anyway. Where does your controller run? Is it, do you have to run it on one of the compute nodes or is it independent of all the compute nodes or is there any restrictions on that? The controller runs outside of the compute nodes. You can just run it on a virtual machine. And that's like a management, consider it like a management machine that you would put in your infrastructure. And then we have the agent that's collecting data that runs on each compute node. I know you guys mentioned it. You're all about real time and monitoring real time. But have you saw in use cases so far any value in maybe collecting the aggregate data and providing dashboard and reports long term? Or is that kind of not your forte at this point? Yeah, we do that. I mean, we are focused on real time. But again, since we publish this data, right? The UI is one consumer of the data that we publish, right? We also have currently a listener that will store the data in MongoDB, right? And we can expand that to other kinds of storage, if someone wants to use Cassandra or something. And the amount of data that you would want to store, that's purely up to you. So if you want to have a really large infrastructure, you really want to store days worth of data or I don't know how long. But then you want to size a MongoDB that's appropriate for that amount of data, that data rate, that size of data. And then you can push that data out to a MongoDB that's set up and appropriately sized. Any other questions? A quick one, what is the granularity of your timing window for your real time monitoring and control? You're measuring packets per second. Yeah, what's the sampling window size? Right, it varies depending on the resource. For some resources, we have a finer granularity than others. And that's just to kind of make sure we don't disrupt what's going on on the compute node. We don't want to have a high overhead, right? So we have to balance that precision with the overhead that we impose on the system. For things like network, we're basically below a second granularity, because we're able to, again, filter what's the important data and push it up. And we're able to, for instance, identify a burst that's happening at a sub-second granularity, even if we're not giving samples up to the user interface at that same rate. The ability to do that data analysis kind of distributed at the compute node itself is what allows us to be both efficient and precise. CPU. So CPU is one that, right now, currently, we use a little bit more, like core screen, like one second, just because, again, it's harder to get that information without kind of putting load on the system too much. Are you using the OpenStack APIs, like CELOMETER or something? So we don't use CELOMETER. We have our own agent, which kind of collects all the metrics that we show, push it out, and push it out on a dashboard. But we have an OpenStack plugin, basically, to have discover all your infrastructure. So let's say, once you have deployed OpenStack and then you deploy AppFormics, we go in there and talk to the NOVA client and the neutral client and figure out all your infrastructure, like what our VMs are running, what our compute nodes do you have. So we have a discovery plugin, which will go and fetch all that information and then populate our controller and our database so that you can see the pretty charts. Yeah, and we also, the agent, after it does the discovery, it continues to listen on the OpenStack message bus. So as new instances come up, then we react to that, which show them, start monitoring them. And we also, like I was mentioning earlier, if you want to assign a certain network resource to each flavor, then you can automatically configure that at the time the instance is created. How, is that what the question was? How do we, OK. So using shares on the scheduler. So how do we do what with the network traffic? Oh, the question is, how do we control the network bandwidth? Basically, how do we? So we take an end-to-end approach, actually, where we make sure that the sender is not sending more than what the capacity is allowed. Now, we don't have to control the application, so that's a little bit of our secret sauce. But yeah, we don't actually modify the guest or the application at all. And we do not drop packets as well. Right. So for CPU, we rely on the scheduler that's in the hypervisor itself, right? So the hypervisor is always scheduling virtual machines. And by assigning shares, you can make sure that the virtual machine will not get scheduled. So it's not, we're not actually stopping it ourselves. We're letting the interface that the hypervisor provides already take care of that for us. Any other questions? All right, thank you very much. Be happy to talk to anyone else along.