 Hi, everybody. So yeah, we're going to today talk about the network visibility. For those of you who attended our previous talk, this is the second in the series of the three talks we are presenting here on OpenStack operational visibility. So this particular talk focuses on the network visibility. And in the previous talk, we did present the overall idea of the theme of where we want to head to in terms of capturing the OpenStack operational monitoring and visibility. So this is the theme that we work with. So to begin with, in terms of the network visibility, where are we right now? What do we have right now? Probably for any user, any admin to go and figure out what's happening in your cluster, what's happening in your network cluster, what is the kind of network load that the various hosts and instance are going through, you may have to actually go through the various logs and various APIs and do a bunch of calculations. And it's a whole lot of complicated. And if something goes down or something goes bad, to troubleshoot, it's such a huge complicated task to go the logs and figure out what went wrong. And in terms of the topology, I think currently in Horizon, all we can see is just a bare minimum topology. So it's a huge daunting task to figure out how the network is performing or what's going on, if something, what are the hotspots, it's really a hard task. So it's a big headache. So where do we want to get to from this current, what do we have currently? So we want, like we said earlier in our first talk, we want to get to a stage where we have a really cool visualization tool and various data analysis tools out there which helps users to easily figure out the status of your network and otherwise, like compute and storage and everything. So we want to have cool interfaces and tools that easily shows you the current visibility of your infrastructure. And as part of this talk, we are going to present you some of these frameworks and also our whole architecture of where we want to head to. So this is our first, as a first step, we want to present you the visualization step and present you the whole picture of network. So I'll hand it over to the team now who are going to go into the further details of the talk. Excuse me, can you hear me? Okay, cool. So, okay, in regard to the network part, what exactly what we want to see, what we want to know about what's going on in the network in our cluster, we think that really depends on who you are, what's your role in this game. So for example, on the left side, if you're an operator, you're a service provider, you have the total control of your cluster, then you probably want to know, the first step will be, you want to know the physical status of your devices, switches, clusters, racks. And then based on that, you probably want to know what the hotspots in the cluster and that will help you to feel easier to define what you should take care of. And then on the upper level, you probably want to know how are you really using my cluster? Are they using it really heavily? Are they performing some heavy jobs? Are they generating huge traffic? So on the other side, if you're a user in the cluster, you probably want to know that, okay, you get a virtual cluster, you get a bunch of resources. So the first thing you want to know would be the resource status in a real lifetime. And then based on that, you probably want to know the performance. For example, you might get a bunch of virtual networks in the neutral, but you probably don't yet know how much bandwidth you can get from each virtual network or specifically for certain virtual machines. And on top of that, you might want to know once, for example, if you have some job going on and it's slowing down or it's just stopped, you probably want to know why, you probably want to get a track of what's going on in your virtual cluster. So we believe these three things are kind of three steps that we're going through. The first step will be data gathering and the second step will be data storage and statistics and the last step will be data analysis and visualization. So here we present you our architecture of the three step. On the left side is data gathering. We're going to expand it a little bit in the next slide. So we're doing the data gathering through a cylinder and as well as the log gathering agents. And then we store these data into both the salameter collector and the log collector. And here what we do is we also push the salameter data into the log collector so that we have a centralized database of what's going on. And then we push these data into the analysis so we get a better understanding as well as we use these data for later visualization. Okay, so on regard to the first step of data gathering we did some tweak in the salameter to get better information. So for now if you look at the salameter agents in the customized, in the non-customized open stack you can only get, regard to the network part you can only get the outgoing or incoming flow. Sorry, the outgoing coming packet or bytes of each machine. But that's far from enough if you wanna really know what's going on in the cluster. So what we do is here you can see in the graph that's a normal workflow of the traffic, right? Going from one machine to the switches and then to the destination machine. So what we get is we wanna get the end-to-end traffic. We want to get to know the VM to VM traffic flow of each VM's pairs. So for doing that we actually, we write it our own salameter agent which could get these in information and then we compare these information with the neutral database so that we actually get the ultimate virtual machine to virtual machine traffic. And regard to the log collection we, besides the open stack logs for example like Neutron and Cinder and Vonova we also collect the server syslogs and as well as the virtual machine logs. For example, if you were running a Hadoop job on top of open stack we're in our environment we're collecting the Hadoop logs so that it's better help you to get to know the information. So based on these data we gather we present to you the avos, the analysis and visualization on open stack. So I'll give you a demo of this video. Sorry, that doesn't look like, I'm not sure how white this shows like that but yeah, the color is really weird. I think it's so good now. Okay, let me, one second. Okay, cool. So this is the avos mainframe visualization. So in here you can see that each circle is one of the virtual clusters you actually have and the heat map is based on, currently is based on the outgoing traffic of each virtual machine but actually if you configure it you can actually make them to represent the CPU or any other heat you want. So, okay. And if you click you can actually get some detailed information and here we'll present you more detailed information on virtual machine to virtual machine traffic. If you go through the network part, sorry, one second. Okay, so if you go through this network part you can kind of see the circle of virtual machines you already have in one of the virtual network and the virtual machines are surrounded as a circle and each line in the graph means a pair of flow that from or to one of the virtual machines. So you can actually see there are four lines here representing five nodes traffic and that means these five nodes are one of the cluster you have and also by just looking at these pattern you will see that there's one virtual machine on the right will stand out because he is connecting or communicating to the others. You probably don't even know this is a Hadoop cluster but you might know that this virtual machine on the right one might be the boss of the whole cluster. So yeah, that's exactly right because it's actually the master of one of the Hadoop work. So yeah, if you hover onto each virtual machine you actually get all the flows that are coming out of this virtual machine so you can actually have a better look of the pair to pair communication. Now let's look at a heavier situation. So for example, we are actually running many jobs many Hadoop clusters in one network, virtual network. So this would look like a little more messy but if you take look into it there are some patterns still inside and again if you hover on one of the virtual machines you will get the pattern. And one of the interesting thing here is we allocate these virtual machines and gather them by their host. So for example, you can see the outside of the ring represents different hosts and by that you can have a general idea of what is going on between host to host not just virtual machine to virtual machine. So it's like a generative like a gathering model at itself. So for example, in this way you can actually get idea of which host is kind of overload by the network side. And just so you know these status are live data they're changing over time. So you can actually, it's a very good monitoring way to see what is going on and the color right now is representing the bytes per second of each flow. So you can actually get to know the idea of if they're a hotspot in my cluster. Okay, so that's so much for the demo now. And sorry. Okay, that's so much for the demo. We were expecting something like this from you but if it's not then it's fine. So okay. And yeah, what we believe is those data's directly get from either kilometer or log are like fresh data's like juice. You squeeze it directly from the fruits. That's fresh, that's live and that's tasty but sometimes you might need something even more tasty. Maybe you wanna change the juice into the wine. So how are we gonna do that? We're gonna use analysis. And here we present Xinyuan to give you a taste of wine tasting. Okay, thank you. And so we have seen that the real time network visualization could provide us with an efficient way of monitoring the status of cloud operation. And now I'm going to use some examples to show how we can revisit and analyze the historical data so as to get more insights of what is actually going on in our cloud and also use them to achieve a more efficient troubleshooting when something goes wrong. So we have collected the network traffic between each pair of virtual machines and Hadoop clusters which is running on top of OpenStack. And this transfers the distribution of each virtual machine's network flows over time. And we can find some interesting patterns from those traffic. For example, you can see the horizontal trajectories that indicates the communication between Hadoop master nodes and slave nodes. And also we can see such kinds of radical stripes that shows the communication between slave nodes and slave nodes. And this kind of feature indicates active status of Hadoop jobs. And if you take a more closer look at those charts, you will probably find some more interesting patterns or features from them. And so how can we make use of those information? For example, suppose that you are running Hadoop services on the cloud and your users complains that their Hadoop jobs are going slow, what are you going to do? Maybe you will have to go into thousands of lines of say Hadoop logs, OpenStack logs, or system logs, and trying to guess what's happening in the system and trying out things one by one. Or you can just have a look at those network traffic. And you can immediately tell that, saying at some point the traffic of a specific virtual machines is stopped. And a couple of seconds later, maybe you can tell there's something is wrong here because there's a delay of Hadoop traffic. And now if you go into the logs now and search for the specific virtual machines and the specific timestamp that you observe the problem, you will just get a couple of lines of logs that are highly related to the problem that you want to solve. And I believe that this kind of process would be much more efficient than the traditional way of troubleshooting. So, and here is another example. And in this example, we not only make use of the network metrics, but also other kinds of metrics that we collected from our system, such like CPU utilizations and disk input output things and network information and everything else. And if we analyze all of those metrics in an integrated way, we can find some interesting features that can tell us if something is going wrong or some abnormal events happens here. For example, we can see the red color on the left graph. That shows some natural abnormals and we can find different features from the abnormal events and the normal Hadoop workflows. And if we do some more mathematical analysis here, we can also find some, we can also find that some abnormal events in the network show some very special patterns of distribution. And we can use that kinds of things to classify the status of our cloud. So, this is pretty much about all of the research that we have done and some plans for the future. We are planning to make use of those natural flows information to help estimation of the network distance. And here we define the network distance as a kind of relationship between two network entities, especially for the optimization purpose. And also we are planning to go deeper into the data-driven diagnosis with more metrics collecting from OpenStack or bare metal or everywhere and do more integrated analysis of all those kinds of metrics that we have. So, that's it. Thank you very much. And we would appreciate if you can go to the survey and tell us some of your requirements or what you would like to see in the future. Thank you very much. Any questions? Great job, that was really neat. For your analysis piece and the visualization piece, could you go over maybe what tools you use and really how much effort did go into to get some of those visualizations? These are standard machine learning tools that we can use all the existing algorithms that are out there in the machine learning literature. So, pretty much we are using some of them in our architecture roadmap. And this was like a teaser for a project that we are just started. And as part of a roadmap, we want to have more extensive analysis on this. So, but for currently, it's mainly the standard machine learning tools like classification algorithms out there and some basic plotting tools that we have used so far, yeah. So, I had pretty much the same question. Is there some place, is this open source overall? What are your plans? This project, yeah, we are actually, right now in the phase, we want to get the feel of the community, how the community kind of reacts to this and we want to gather requirements and we are eventually going to open source. So, we may actually plan this out as an open source project with the help of the community. Right now we want to see who interested people who would like to collaborate and figure out the requirements, yeah. Absolutely. I just want to congratulate you, this is awesome work. Thank you. Thank you. So, the NetFlow that is getting generated, what format is that in? Is that like IP fix standard format? Any NetFlow collector can consume that data? So, the question is... The flow information, is that in a standard format or proprietary? Yes, actually, it's, we're using standard format to compare the flows. So, actually, yeah, it doesn't really matter how is your network configuration. You can always find the flow information at some point. Okay, thanks. Any questions? Any other questions? And like I said, if you would like to answer some questions out there in the survey, that would help us gather the requirements and what you would like to expect, what you would expect to see in this project. That would help us define the roadmap for this project and we actually would like to take this in the OpenStack community and try to take it to the next level, yeah. Thank you. Thank you.