 So let's get started. Thank you everyone for coming. I think this is probably one of the last sessions of the summit of the main conference So thanks a lot. My name is Fawad Khalik I am part of the TAP team and I work for Plumgrid and I have over here with me Anil who's from Gigamon and he's leading the effort on TAP and I have Riddip here he's with the NEC and Then I have Takashi who's from Idukura and then all you also have a couple of team members sitting over here or from Fujitsu Soichi and Kaz they contribute to TAS TAP as a service so with that Today's presentation is about TAP as a service This is something the background here is that this is something which has been which was introduced a while ago I think a couple of summits ago And you guys must have gone to the presentation at Vancouver where Anil was presenting with the OVNA and Initial overview was given and some progress has been made since made then so the the idea for this presentation Is that we'll talk about the progress that has been done in the open-stack community around this because there has been lots of request from operators that they really need this Feature to be able to deploy in there, you know a cloud deployments, so the agenda for today's presentation is We'll go over the introduction of what is TAP as a service then we'll talk about why do we need TAP and if some some of the you know problems that are there which are not addressed as part of the current feature set or Advanced services or you might have to deploy tools which are outside of open-stack. This is where it comes in and Then really we'll cover The progress which has been made so far Then we'll also go into a bit of technical details of how the object model looks like how the API is supposed to behave and Then we'll have a nice demo by Anil for you guys on the current implementation We'll go with that and at the end we'll discuss the next next steps And what are we looking forward to and things we supposed to add because there's there's some you know Features which are not implemented yet, but they're on the road map so that you guys are aware of those in the end We'll open the floor for a Q&A for any questions that if you guys have any of those So that let's go over. What is TAP as a service? This is This is an extension in neutron and advanced services extension in neutron Which will be which will work just like you know firewall load balancer you interact with the neutron APIs and It will provide you port mirroring So if you have a bunch of neutron ports, you say I'm gonna apply my I want to monitor traffic on these ports. You would be able to use this Neutron extension To be able to you know redirect traffic traffic on to a copy You know packets from those ports onto some destination port that you can define and we'll go into the details of how that works This is something for the operators or tenants that you know you should be able to Provide networking Monitoring on those you know traffic that you capture and then inside those ports you might be running a VM You might be running a physical box behind your VTAP gateway that you have you know some traffic monitoring software or some use cases different use cases that Any any will touch upon and based on that of course you can you know Build your use cases on your open stack deployments using this step. This is where you're all This gives you ability to send all the traffic for those destination source ports We are also defining some model in which you can we can define some improvements that that's something we'll discuss in the Roadmap for a tap so a bit of visual on how it looks like if you're a user of neutron you interact with neutron API and In this case, let's say it has an X is an extension to Newton API and the way it works is that you have Your virtual machines or let's say containers provision on these hypervisors or let's say compute nodes and You define and you would want to have your traffic from let's say port one and port two or let's call it source port one and source port two to be Copied over to another place in a VM that you can you know run tcp dump on or on some some other services on Two minute or what exactly is going on and you can find like ingress egress etc. And you'd have let's say tap as a service You know a process running on these nodes which will be able to capture this and send it maybe on the same host or a cross host Based on your use case This is very high level and we'll jump in the details of this as well at this point I would like to hand it over to anneal and to talk over about why do we need a tap? Thank you for that So before we go into the nitty-gritty details of how this service actually functions We'll take a step back and try to find out like why is there a need for a service like this in the first place? So let's take a quick look at a conceptual traffic monitoring setup or a process. What does this involve? It essentially involves placing tap devices at some appropriate locations in your network and we are talking here about a Virtual network infrastructure and then attaching some traffic analyzers to those probes Like these analyzers once they're attached in this fashion they should be able to see the same traffic that the endpoints were originally seeing and It would have these analyzers would actually look like they are actually in line Now there are different types of tap devices a physical tap device is something that you just attached to a wire a Logical tap device on the other hand can be constructed by using the port mirroring capability of a modern switch These bottom switches allow the traffic on some ports to be captured and then replicated and delivered to a designated destination port And given that most modern virtual and physical switches do support port mirroring This begs the question that why is it not possible to monitor traffic in open stack virtual networks today? So what is really stopping us? The answer to this question lies in understanding certain architectural characteristics of cloud platforms The two that we have named in here, which are the most interesting for us our multi-tenancy and location independence What is multi-tenancy? We all know that in a shared platform like a cloud environment You would want to basically partition the resources so that different groups of users can actually use their resources without knowing about the existence of others This sharing is actually done in such a well and isolated fashion that most tenants are I mean Oblivious of the fact that others are existing with them on the platform Multi-tenancy has some very interesting benefits and one of the most important ones is delegation of control If you have worked with neutron networks, you will see that tenants are able to create their own private Virtual networks setting their own IP subnets in there. They have the freedom to connect these networks using routers, etc While making sure that in no way are they going to disrupt the traffic of another tenant sitting on that same cloud platform On the right hand side of this table, there are certain characteristics listed about location independence This is mostly concerned with hiding the identity of the physical components that are actually hosting this virtualized workloads on the cloud The one immediate benefit that everybody knows and appreciates is the ability to migrate VMs from hosts from one host to another host The main reason this happens is that the VM is actually not intimately tied to the hardware There are some other benefits of location independence. One of the lesser known ones is that it allows for more efficient resource allocation VM placement on host storage placement on these systems, etc are all achieved because of this concept of location independence So given that we have these two architectural characteristics in a cloud platform We can make certain observations One of the observations is that tenants are typically unaware of where their VMs are residing in a cloud The second one is that VMs belonging to different tenants may be Co-located on the same host and Finally tenant virtual networks often span across multiple hosts So given that this is the nature of the environment we are operating in It makes sense that a tenant is typically not allowed To access the controls of the underlying switch fabric So whether these are host level virtual switches or they could be top of rack physical switches You typically wouldn't allow a tenant to directly go and access the controls of those switches Unfortunately, this means that the port mirroring that we just talked about a few moments ago is not available to a tenant So that is the actual problem space that we are trying to solve in here So what would one desire in order to monitor traffic you would essentially want a tapping service That allows the tenant and or the cloud administrator to safely monitor neutron ports We want to make sure that tenant isolation boundaries are not broken or compromised in any fashion and Secondly because these virtual networks span across multiple hosts We need the port mirror sessions to be able to also span across multiple hosts Or in other words remote port mirroring is of very high importance to this service in here Tap as a service is the platform oriented solution that we as a team have proposed and implemented and We feel that it satisfies these needs that have been listed in here What it has essentially done is it has virtualized port mirroring which used to be a switch level function and We have now brought this facility into the hands of a tenant TAS will serve as a basic building block on top of which more Complex traffic visibility solutions can be engineered and in the demo We'll actually go over a couple of them to see how TAS can be utilized for solving real-world problems With this I'll hand the control over to Riddip. We'll be talking about the project progress made so far in the project Thank you nearly Hi, everyone. I am Riddit Banerjee from any city policy India and I'm here to present the progress of tap as service So you would see in the demo that the first version of tap as service has already been implemented and it has been implemented on the OBS We are also on github. So if any anybody wants to Clone the code and work on it experiment with it and let us know if we face any issues We have also applied for an official inclusion in the open stack governance for treating tap as service as an official project and We are also in Conversation with the neutron core and the neutron ptl So that it can be included in the neutron stadium With the help of Soichi and cast we have also implemented the task of tap as service as a horizon dashboard However, this dashboard which you will see in the demo is currently Working on kilo, but we are the work is in progress to move it towards the master of horizon Tap as service is also available as a CLI So when you can when you check out the tap as service code and you restart the diff stack then you can easily See how the tab as service works with neutron So it has been implemented here and you can see the CLI is by neutron service and neutron tap tap service and neutron tap flow as a part of governance and as part of Implementing the patches. We have also included tempest jobs and gate jobs in the governance Now for the object model, I would like to present Yamata kashishan from Medakura. Thank you so We have two kind of resources Wine is tap service and another is tap flow tap service specialized Destination of mirror traffic and tap flow specialized the source port of Porto mirroring Let's see how it works They are considered there's two for neutron ports and Associated instances Instances here is usually Nova VM or some kind of a continent so We want to Mirror traffic on the right VM to Left instance First Create a tap service instance for this to specify the destination and Create tap flow to specify the source port and Associated to the tap service and traffic on the right Instance is like this We have a field in tap flow resource to specify the direction of traffic to mirror by default It it mirrors both traffic like this Now it's a little complicated case We can have multiple tap flows associated to a tap service so Multiple traffic on multiple neutron ports can be monitored By a single instance Nine now back to you ready Thank you. I'm also So As I explained in the progress so far slide we have successfully integrated with the OVS So this is what as you can see this is something which we have already implemented and which is working with the OVS so in this case, this is an agent based implementation and In this case, we have got a plug-in service Which communicates with the task agent the task agent is working on the compute node and the communication is via RPC The task agent then communicates with the driver using a driver mechanism and we have created a driver for the OVS Which goes back goes on to switching a limit and communicates another option or under implementation, which we are proposing right now, which is still under works is a Controller based implementation in this case the plug-in service the task plug-in service will directly communicate with your SDN controller and The process can go forward directly without the use of an in-between agent But as said earlier, this is still a work in progress and this has not been implemented yet So for the demo I would like to call Anil Rao Thanks, really So we've got a little bit of an introduction to the object model as well as Two or three or two of the different types of implementing the back-end for this service So what we're going to do next is we'll do a demo. It's a recorded demo But it's our real system that we had put together a few weeks ago In this demo, we'll essentially show how a tap service can be created how tap flows can be attached to them We will be looking at two particular use cases One of them is web graphic analysis and the second one is how to build a centralized intrusion detection system on top of that as a service The demo will involve the use of the horizon dashboard We have taps integration with that right now. So we're going to go completely through that GUI workflow and We will then be showing the additional use cases So let me talk a little bit about the environment before we start the demo. It's a multi-node DevStack cloud So we have a few separate nodes in here. We have one controller node There is one dedicated network node and then we have two compute nodes in here On the compute node, we are hosting a bunch of VMs and we also have one special VM Which will be our monitor VM and this VM has been populated with certain graphic analysis tools in them In addition to these VMs, we have three systems that are sitting outside this cloud And there are three Ubuntu linux desktop systems that are used to mimic the operation of our behavior of three end users interacting with this small cloud So with this we will flip on to the demo setup So as you can see here, we have a multi-node DevStack environment running in here with the controller the network and two compute nodes and on the left hand side here, we have three desktops Situated so what we'll do in the beginning is from desktop one. We will essentially connect to the horizon dashboard and We do that by accessing the URL of the controller We are logging in as a demo user in here and Once we are into the system, we'll notice that we have our Four VMs in there three of which are running web servers, which are VMs one two and three and Then there is a VM on which is the VM in which we'll be conducting the traffic analysis These three VMs are sitting in a virtual tenant network, which has the subnet 10.0.0 slash 24 And we are using external IPs so that the desktops can access these VMs in the cloud We'll take a look at the topology view of this setup and We can see clearly that the Tenant has one demo network On which these VMs are residing and then this demo network is connected via Virtual router to the public network and our desktops are essentially sitting on the other side of the public network So we'll begin by creating a tap service as the Kashi had just described And we are going to do it by clicking on this tab on the right hand corner You need to specify a name for the tap service the description is optional and The next thing is we've got to select a port which is the destination port of this service So in this case, we'll be picking the port for our monitoring VM And that's the VM where we will be depositing the mirror traffic frames Once the service has been successfully created you will notice that the monitoring VM is highlighted in light blue Our next task is to create certain tap flows so that we can start monitoring source ports We can do it right from this picture itself by clicking on this create tap flow tab Once again, we give a name for the tap flow the description is optional With respect to the monitoring direction, we have three choices. You could do it in the ingress egress or a bi-directional mode and Once we have picked the port if that VM had more than one VNIC You would have a list of ports that you could pick from a VM that is attached By a tap flow to a service gets colored a little differently to show that that VM is now being monitored We will pick another VM for our exercise Let's pick VM 3 but we'll do it in a slightly different fashion not using the topology view We'll go into the tap services tab and we'll create the flow from here We have essentially a very similar workflow here with one slight difference Because we are doing it from this tabular fashion view We need to actually identify the ports that we want to pick so we're picking the port for VM number 3 out here The successful creation shows that this tap service now has two flows in there Going back to the topology view for a moment You will notice that both VMs 2 and 3 are indicated by different colors to show that they are now being monitored So this concludes the config section before we jump on to actually sending some traffic through this tap service We'll take a quick refresher of the different IP addresses. We are using So VMs 1 2 and 3 are on the 10 dot 0 0 slash 0 slash 24 subnet And then we have three external IPs to reach them So let's now go in via desktop 2 The top window out here is showing a shell in the monitoring VM That's the VM which is serving as the destination side of the tap service Now from another window in this desktop. We will try to send some traffic towards VM 1 Now if you remember a VM 1 was not part of this tap service because we are not monitoring it at this point in time So when we run TCP dump in there the monitoring VM can't Identify or notice this ICMP traffic towards that VM However, if you send some traffic to VM 2 you will notice that The monitoring VM is able to recognize that traffic flow The same is the case with a VM 3 What this demonstrates is that the basic tap service is operational and our tap flows are correctly capturing and sending traffic towards the destination So let's say that we have this service in place. The next question that we want to answer is What can somebody do with something like this? So we'll pick up one use case, which is the analysis of web traffic So our three VMs VMs 1 2 and 3 of our hosting web servers on them So what we're going to do is we will log in from desktop 2 using a web browser and we'll start navigating one of the websites So remember that VM 2 is being monitored So whatever traffic is being generated here because of this web access that traffic is captured and delivered to the monitoring VM So in the monitoring VM, we will run a small Web traffic analyzer. We are using an open source one, which is HTTP ry So we have configured that using a small shell script The website in particular is a binary search tree tutorial. This is just an example. It could be a much more complex website So we will run the traffic analyzer on the monitoring VM and then we'll wait and see what happens as This user is trying to navigate this website So as we start clicking on the links in that website if you watch that box on the top left You will see that the activity is being examined and exploded out So that one can figure out what somebody is doing on that website as The user is traversing this binary search tree and going down the different links in there You have a full explosion of the time of access the source VM the 172 16 2.1 is desktop 2 We have the destination VM in here. We have the HTTP method. We have the URL We have the response code that the server is sending back to the client And by the way, this is a very small set of data that we are showing here just for the sake of this demo This information can be typically logged and later on analyzed more in more substantial and explosive fashion What we'll do is we'll now go into desktop 3 and try to access that same website Now as a user from desktop 3 is traversing that website since we are also monitoring This one our monitoring VM should be able to recognize the activity of this new user So let's go back to the monitoring VM and you'll find that desktop 3 which is 172 16 3.1 It's also been completely tracked Now this is a good way for users to find out where their Users of a particular web service are coming from what time of day that the users are accessing their sites Or which parts of the site the users are interested in our spending time in So having gone through this particular exercise will take a second application, which is more of a security oriented application and We will show how a centralized intrusion detection system can be built on top of this service So in a web environment or in any kind of a cloud environment like let's say we are running web servers It's very typical for users to disable other forms of access to the system. So in this example Playing the safe route. We have disabled both telnet and SSH demons on that machine on our three VMs VMs one two and three But it might still be interesting for a security analyst to find out if somebody is attempting to access these servers and Secondly, if they're accessing it from where are these accesses coming? So we're using snort, which is another open source IDS system and it's been installed in the monitoring VM And we have some rules in here as you can see we have rules in here to capture TCP traffic directed Towards sports 22 and 23 for these three VMs These correspond to SSH and telnet traffic So if we were to start attempting to access VMs one since we are not monitoring this VM What we will see soon once we turn on the snort IDS system That the idea system won't detect this access attempt So let's give it another run We'll turn on snort in here and goes through its initialization phase And once it settled down we'll try another access attempt in here to VM one Obviously our SSH demon has been disabled. So that access will be refused But when we try it for VMs two and three you will find that our idea system is able to detect those access attempts Now if you try this again with VM With telnet we'll find that telnet attempts to VMs two and three are being detected If you look at the output of this snort alert in here, we find the time of access We have information like what type of access it was it SSH or telnet We know where that access came from. What was the source machine that generated that access attempt? And we also have it for the telnet case We have shown a very simple use of snort in here In addition to just logging and putting up alerts on the screen you could send out email reports You could be sending notifications to applications that can take corrective behavior For example, you might want to change or if you find that a particular VM is being hammered a lot You might want to do some corrective action in that case What we have shown in here is that by using tap as a service you can carry out this kind of activity from a remote and a centralized location In none of the cases have we put any monitoring analysis tools in those VMs themselves All of our tools like snort or HTTBRY were located in a remote VM which was our monitoring VM So the original VMs that we are observing are untouched And this is the beauty of this system that is from a centralized location a security analyst or a data analyst can carry out lots of interesting analysis of network activity So with this we will return back to the presentation and Fawad will talk a little bit about our next steps Thanks That was a nice demo Now we are at OpenStack Summit and I think we are going to some interesting sessions and we talk about scale and performance and all of those things And I know some of you must have been thinking, well, port mirroring is happening but what happens with scale? So many things are going on and you have several ports and you are sending traffic from all of these ports to all of your destinations and there are several of those So how does it really play in the picture where the OpenStack is going today because we are at this point where we are discussing things like performance scale and high availability And to take into account such aspects we are considering adding a notion of policy based tab which is something in discussion and we are considering upstreaming as part of the Tap as a Service API And if anybody would like to see a demonstration I have it up and running and I would like to I will be happy to show you guys how it works And the idea is that you can define your own policies, you know, maybe my whatever my HTTP traffic for this rule, whatever maps you know send it to that particular destination That's the idea behind it The other thing is that another sort of use cases which happen in the Tap world is that one other thing is in this demo we saw that you are able to define your destination Which are your VMs or virtual Neutron ports However Neutron also supports you this thing called L2 Gateway which allows you to connect to different physical switches and access boxes behind those And this is where you should be able to define them as destinations because in the end they are Neutron ports and that should allow Tap as a service to define them as destinations and your packets over on the physical switch to those boxes And another thing that we are planning to add up is Quotas because this is where some operators might have use cases that they only want port mirroring Now things like policy based app and that's where you can define your limitations on Quotas to make sure that you know, you know, there are no scale limits in that area Then there might be some overlap between QoS and TAS it might just be a matter of testing rather than any code changes or something we plan to make sure that that's also covered Tempest testing is something will be enhanced, there is some existing one added Rally as well as these are becoming the standard of testing for OpenStack and the CES support And also we are trying to see where does Tap as a service fall in the governance of OpenStack Neutron So based on that it might be a separate repository or might be known as an advanced service or something else This is something still in the view So to just summarize for everyone the session I think at this point we covered how and where Tap as a service is today And what are the things that we are planning to add in the future And some of the work that has been done already between myself and all the team members and Takashi has been doing some work as well As we plan to upstream One of the things I would like to correct over here is that the agent-based model that we have, the controller-based model that we have is really works It really has to be upstream and the demo that I have with policy-based app is actually kind of uses that This is something we'll upstream as part of the API And I think this is where I'll open the floor for any questions, one more thing I forgot to add This is the information, if anybody would like to contribute to Tap as a service project We have added the links here, we have the GitHub repository there, bugs and files and blueprints are filed on the launchpad And we have a weekly IRC meeting on Tuesday's PST I don't know what the UTC conversion is And if you want to find us on IRC, please use OpenStack Neutron And you have all of our information from the presentation We'll post it on the slide share after this session So you'll be able to get the information With that, I think we are able to wrap up And any questions if any of you have? Okay, we have questions Yes So in the TCP dump, it seemed to show the raw packet But in the discussion, you mentioned time stamps Is that just the time stamp that the packet arrived from the point of view of the monitoring VM Or is it encapsulated with some metadata to provide that information? Yeah, that time stamp that we saw was the time stamp provided by those analysis tools That was the time when the analysis tool saw the packet Okay, so there's no metadata that then... We don't have that yet Okay, is it tunneled when it goes to, for example, you mentioned it going off-box Is it tunneled to... Yeah, so our service is using the underlying neutron tunnels that are there between hosts We kind of are overloading them to carry out this non-production traffic Okay, thank you You mentioned that you've got the L2 Gateway feature in your roadmap But in its current implementation, can I use a service tenant In an admin tenant to source from the ports from our user tenant And pull them all into a single box and use that as a monitoring box for the whole environment It's a very good question This is actually one of the common use cases because the use cases, normally what happens is that There's this tenant aspect, but then there is an operator aspect Because one of the uses I've seen is that you have your virtual machines And every machine has two ports and that's kind of a back door for you to figure out what's going on In this case, there are capabilities depending on how you use neutron Using maybe provider networks to do this multi-tenant connectivity And if you do that, you should be able to send traffic across or using shared networks That should be possible Nice talk and nice feature So if you have multiple VMs or mirroring traffic to another VM How do you distinguish which VM a particular packet came from when it turns upon monitoring VM? I mean, you can tell from the IPs in some cases But if you're using say a loud address pairs or something to have a floating IP of some kind Then how can you know which one particular one received that packet? In this case, let's say the demo that we showed over there You're sending traffic, let's say, on the same domain So you have information, whichever is part of the packet You're able to see MAC addresses, IP addresses If you're going across of them, you will still have the information about your destination IP address And that's something you should be able to give you because that's something unique in your L3 segment Hello, very nice presentation, thank you You've demoed network intrusion So in network intrusion, it's very important that we're able to port mirror all the traffic Now, of course, that's not always possible, especially with high throughput pipes Would you have performance charts that indicate how well the tap as a surface is working today? Yeah, we haven't carried out performance analysis at this point in time But I can answer that question nevertheless by saying that there is the substantial cost to both capturing as well as backhauling this traffic to the monitoring VM But as Fawad has recently just mentioned in the next step section We are looking at some techniques to reduce that volume of traffic So although you might be having hundreds or thousands of VMs in the cloud By using a policy-oriented tapping mechanism We can reduce that thing to just the VMs or next that you're interested in monitoring So for example, you could have database servers that you're more concerned about than a website holding some product catalog So there's different types of VMs that can be differentiated using the policy-oriented ones So that the tapping will be a lot more fine-grained And not exactly what we showed here where we were pretty much sending everything out Just one more question What type of resources are actually used by the tap as a service? So I mean, we are essentially introducing, I mean, in this implementation that you saw We are introducing certain traffic flows inside the OpenV switch And we obviously will be using some OpenV switch features like we are using some tables in there And we're adding flow entries inside that The switch by itself is doing the packet duplication for us With respect to tunnel usage, we are at the moment using the existing tunnels that are there in the production network We use specific tunnel IDs and VLAN IDs to segregate these tap service-oriented traffic from the production traffic So we aren't really that heavy in terms of resource usage But obviously there is CPU consumption because there's more work happening on these nodes And then secondly, bandwidth is getting consumed because mirror traffic is also flowing now in these overlays Thank you very much One more optimization on the operator side is that the optimizations that are on the operators do is that In large deployments, you define affinity of your destination And in this case, you don't have your 200 servers sending audio track to one guy Maybe you segment it in a sense that you have kind of distributed And from there, you should be able to take it to your central point Where this is something as an approach which is often used Sort of related to the same point Is there any control if all the tenants on one physical server start off a tap session And it saturates the link, and I mean, we don't have any Does the provider have control to prevent that? Yeah, there are two ways in which we are trying to address that issue So one is the one that Fawad mentioned with respect to introducing some quotas The quotas would allow us to limit the number of tap services and tap flows That a particular tenant is permitted to instantiate In addition to that, using QoS policies, we can basically control the rate at which these mirrored packets are flowing in the network And then finally, there is outside activity, outside of tap as a service from programs like the group-based policy program Where they are investigating ways in which they can decide which tenants get the ability to do tap operations What about things like a provider control for aggregate amount of tap traffic on this source by the servers? The provider's ability to say, I'm going to limit the aggregate tap traffic on the server to 1 gig or 10% of the NIC or whatever So that no matter what combination of tenants, it will be capped at 1 gig and doesn't impact the link You could use a heavy fabric for that And being the operator, you should be aware of what you expect of that And that should give you some data points But using a heavy fabric should be a reasonable implementation We can discuss more Guys, if any more questions, we'll have to take them offline It's 451, thanks Bye