 Thank you everyone. Thank you for being here. I am Deepa Karnadurka. I'm a developer. I worked on network technologies for over 10 years Co-presenting with me today is Ramya Bola. She is also a developer. She works at VMware as a software programmer and I work at Ericsson. OF Connect is a library that is open sourced by code checks, which is a nonprofit organization mostly run by volunteering engineers and our core focus is to provide opportunities for technical growth for women in engineering. Ramya and I have been in the core team of contributors for this project and we're very happy to present it to you today. So once I go over what OF Connect is and what it does, we will discuss, we'll go quite deep into the design, show you some code and then the other aspects that came together as part of this project, the first being the unit test automation. The second is how we integrated our library into a real-world controller to prototype it as proof of concept to show that it works and finally the SDN setup that came together for our testing. It's a small minimal setup, but we'll show you what we have used and if we have some time we will also demo and if anybody is interested in joining us, we are still in development phase, if anybody wants to join in contribute, we'll have some pointers on how to get started. Everyone must be familiar with this, but I still for completeness I want to go over a quick introduction of software defined at working. A key concept here is the separation between the control plane and the forwarding plane. This is the control layer right here, which is implemented usually in an SDN controller and the forwarding plane is essentially a network of switches. OpenFlow is the predominant and an early SDN protocol that achieves SDN and our core focus is also of OpenFlow, but even more so our focus is on the communication aspects of OpenFlow which is defined as a channel spec within the OpenFlow protocol and that's what defines how a controller and a switch will talk to each other. So here I will give you some details on what the problem we really saw was. We have a controller and a switch talking OpenFlow channels and we saw several implementations of it and some problems, some glaring problems that came through where every OpenSource implementation had its own source code for achieving this communication, so we saw a lot of duplication even in the OpenSource universe. The second issue we noticed was a very high coupling between the communication, the low-level communication handling and the rest of the OpenFlow implementation. So that directly leads into problems like how do you handle multi-version every time an OpenFlow spec comes out. We have a new transport protocols, we have to make sure there's a version matching between the two sides and so on. It gets a little harder when the code is so highly coupled and not modular enough that maintaining it over a period of time gets very difficult. So our solution to this was to abstract out all the communication, the channel aspects of it into its own library and that's the blue box that you see on the slide and some other things that we thought were cool was to have this library support both the controller and the switch so it could be used on either side of the communication channel, package it into a shared library that can that can be very easily compiled in and finally publish it as free software. So this leads us to the set of requirements that we placed on our library. The very first thing is to hide all kinds of wire protocol details, low-level socket calls and all of that. The second is to manage connections to the channel setup as much as possible. We have had a good start. Right now our library recognizes the OpenFlow identifiers for every channel and in the next phase, excuse me, sorry, in the next phase we intend to include a lot more of the channel setup as part of the library itself and health monitoring of the connection and so on. So we really want this library to be very suited to OpenFlow. Some other requirements that we placed on this was multi-language bindings. So although this library can be used either on the controller or the switch for the purpose of the stock I will stick to the controller also because that's where we have done most of our testing. So multi-language comes in much more strongly on the controller world because we have three predominant languages that are used for the controllers which is C, Java and Python. We have implemented this library in C and we have provided bindings in Java and Python so that's that's covered. The next is the platform independence. I think this is a big open area and we have made some strides by using the G-Lib library which provides platform independent system calls. We can do a lot more so it's work in progress but we have had we have done a good start on this and we have compiled it on different platforms. I'll be talking about some of our design goals for this project. One of our main design goals was to you know handle performance under scale to handle 256k simultaneous connections and we also wanted our library to be capable of supporting multiple transport protocols that OpenFlow supports like TCP, UDP, TLS etc and the library should also be capable of adding new transport protocols as and when OpenFlow supports them. Another goal for us was to provide a common API so that we can integrate our library with both a controller or a switch as well. So the problem that we wanted to tackle in our design is best described by this use case here. A controller which is connected to a switch has one main OpenFlow channel which is the data part channel. It has a DPID of 1 and AUX ID of 0 and a controller can also have up to 256 auxiliary channels with a switch and each of these channels are identified by the same DPID but different AUX IDs. So on the whole we can have up to 256k channels between a controller and a switch and typically a controller can support up to 1000 switches which is normal in the real world today. So that's where we get our scaling number from which is 256k simultaneous connections and that's why scalability is an important design goal for our project. So this is a block diagram representing the various components we have in our library. As Deepa has mentioned a controller or a switch can be written in multiple languages like C, Java or Python. Since we have our API written in C, we have a small layer of SWIG bindings that export our API in Java and Python. Apart from that we have three main components which is the ORF Connect API layer itself and the net services is the component responsible for encapsulating the transport protocols and the elastic pool of threads is the component we have built to address the scalability issue. I'll be talking about our API next. This is not an exhaustive set of API we have but some of the important functions are listed here. We have some functions that we use for the initialization and some functions which are used to handle the channels itself and the third block represents the functions that are needed to actually send and receive OpenFlow messages. One of the goals as I had mentioned earlier was to design a common API for both switch and controller. So we actually have a lib init function wherein a controller or a switch can register itself with the library as a switch or a controller and it can provide the IP address and the L4 port on which it is running using the dev register function and the channel functions, the create channel and destroy channel are the ones which are used to create new OpenFlow channels or destroy them. The ones below the accept and delete channels are the callbacks that any controller needs to implement in order for us to notify it as and when we get new OpenFlow channels or the channels get deleted. So this is a sequence diagram that depicts the OF Channel setup process. As you can see here a controller initializes the library and registers itself with the library. Once that is done the controller doesn't have to worry about the underlying socket function calls. So library is actually hiding it from the controller and once the socket connections are taken care of the OpenFlow channel is set up, this is the stage when the controller and the switch begin to exchange the OpenFlow messages like hello messages and echo messages and you know other things like features request, feature reply. So eventually we plan to handle these messages in the library itself and thus the burden on the controller will be reduced further. So once this OpenFlow channel is set up that's when the controller and the switch can actually exchange messages using our send and receive packet APIs. This is a set of logs we show when during the OpenFlow channel setup process a controller initializes the library and then it registers itself with a particular IP address and L4 port tuple and during the registration process a controller also has to provide the callback functions that it has implemented. As you can see here once a new OpenFlow channel is received we notify the controller using the callback it has provided that a new OpenFlow channel is ready to be accepted. And as you can see we have integrated our library with the mule controller and we've tested it out with the mule controller. This is the next component in our library the net services component which is responsible for encapsulating the transport protocols. So our main idea behind designing this component was to provide a generic framework so that you can support simultaneous multiple protocols and also to be able to add new protocols as and when necessary. So we've provided this net services data structure with a bunch of callbacks so an example on the right is for TCP. So if you want to add TCP support in your library then you need to implement all these set of callbacks and define them. So similarly we can add any any new protocol later on. I'll give you a simple example of how this can be done. These are the basic three steps that need to be done in order to add any new L4 protocol. You need to enumerate the new protocol and then we need to define the net services callback functions for that particular protocol and we also need to define the poll in and poll out callback functions that are responsible for handling the open flow messages as and when they come in or go out. These are some set of logs for our net services component. So as and when a new switch initiates a new connection with our library that's where the the callback for our listen fd will be called and this callback will in turn trigger the TCP accept function which will create a new fd and add it in our poll in thread manager. So from this time the new fd is ready to receive and transmit new packets and every time there is a poll in on this new fd we call the callback poll in function that that handles that message and every time we need to transmit a message open flow message out of this fd we call the poll out callback function that the protocol has given us. So the elastic pool is at the very heart of the infrastructure that we have built and if you ask us one good reason why you should use this library versus another I would say this is it because there were two key factors one is this is basically a pool of threads that we're managing but we're managing it in such a way that we can have up as low as one socket fd or as high as scaled up to 256k without over provisioning or under provisioning the resources and the second aspect is that to be really aware of the you know difference here between stn channels versus a web server kind of connections in that once you set up a channel between a server and a server which is a controller and a switch it is quite permanent it doesn't it's not too much a flux so you don't expect it to go down you know you don't expect it to go up and down too frequently so keeping that in mind what we have designed here is instead of having a pool of threads that are waiting to be load balanced and round in round robin fashion what we have done is to have as few threads as possible and maximize the utilization of the capacity of each of those threads that we already have so until our first thread is really filled to capacity we are not you know provisioning resources to the next thread I'll go a little more into how it's a very simple algorithm but I'll go into the details of how we are managing it but I'll start with the individual thread block the blue box that you see is the the thread itself which has a pooling loop in it and I'll give a few more details on how we set up the pooling thread in the next slide but the yellow box that you're seeing is the interface that allows you to add or delete socket fds out of socket or pipe fds in out of the pooling loop the pooling thread utility which is at the very top the block at the very top is what manages our global data structures and make sure all the counters are up to date inside an individual thread we have a pooling loop but while we set up the thread we also have two administrative pipes that we set up one is for adding or deleting fds which the net services callbacks use it is a it is a control pipe and the interface is provided by the thread manager the second is when we want to send an open flow packet out onto the wire the api directly would can send that packet into this pooling loop and it'll get processed to be pushed out on the wire which I'll get to in a bit so that's the second pipe we have the next is for every fd we need to register a pooling and a pullout callback function and that's done by the net services callbacks which is the point number three that we had to introduce whenever you introduce a new layer for protocol is define these two callbacks the polin and pullout callbacks next up I'll show you how a packet is sent out so when a packet open flow packet arrives at the pooling loop it's a polin that occurs on the fd the pipe fd and at that point we put that packet into a hash table and we set the pullout flag on the corresponding outgoing socket fd and then the next thing that happens is the pullout event occurs and the packet is pulled out of the hash table pushed out on the wire and the pullout flag is reset that's essentially the way the packet is sent out the in path is actually very simple the polin polin flags are always set for all fds and an event occurs and the net services polin callbacks kick in and it's directly sent out to the api out onto the controller on the switch on to the pool at some point we would have we would end up with a bunch of bunch of threads with different capacities of sockets that it can take on right some holes in every thread so the way we organize it is have a decreasing which is I think the next slide has some more information so we have a global list of all the threads with sorted by decreasing number of available sockets so that if you have one thread with five sockets available up to capacity and another with three the first one will be five next will be three and others will be filled to capacity we do that and then anytime we want to add a nft to us to a pooling loop we need to find a specific thread that can accept it so the best thread that can take it and the find or create function is what gives us it iterates to the last thread with availability and gets us to the thread that can accept the new fd these are the two add and delete functions the first one the first one does exactly that it finds the best thread to add the socket fd using the function that I just spoke of and the second one is a delete which deletes the socket from the thread now we have a hole in the thread as in we have capacity in the thread and the list is again resorted this is a poll in and pull out logs I think I'll start by showing you the fd's that are being pulled fd8 is the control pipe fd10 is the data pipe 3132 are two sockets being pulled on we get a poll in on fd32 this is the incoming path of the packet a packet is received and then which which as you have seen earlier to we trigger the callback from the controller the poll out we have two polling events first is the poll in that happens on fd10 which is the data pipe so the packet arrives put it in the hash table set the poll out flag and the next thing that happens is poll out on the outgoing fd we're sending a packet out and resetting the poll out flag unit test automation we have used glib very heavily for unit test automation it has a great framework for it we have two ways to run our tests either through make or through g tester once it's compiled this is we focused most of our coverage on the infrastructure itself so we started with the pole thread we got it fully tested then we started then we went on to the utilities which exercise some of the elastic pool but bear in mind that we have not yet scaled our we're still in development we have not yet scaled all to 256k but we have still tested that infrastructure for a few sessions we've also covered some net services as part of our tests now this is the full list of unit tests that we have the results that that are put up here tc3 for example just test the poll in callback function and on the pole thread and utility c2 is does more of an end-to-end for a single session by using the controller's port itself integrating our library with an existing controller was an interesting challenge so there are some very simple steps to follow if you would like to integrate our library with with a controller these are the basic three steps here the controller has to implement you know the call these basic three callbacks and once that is done it has to initialize and register itself with the library it also has to take care of you know the freeing part of the library to free all the resources which it has used we've come up with a basic sdn setup in order to test our library and this this this picture here represents the whole setup we've used for testing we've integrated our open flow driver with an existing mule controller which is an open source controller written in c and we have deployed mininet which is a framework for testing sdn and in mininet the open flow v switch is is an integrated part of mininet so we have deployed mininet using a minimal topology so we have a single switch and two hosts associated with that switch as you can see here we've specified that the controller is remote here so the open flow switch on mininet will try and connect to the remote controller which is our mule controller with the oath connect library integrated as part of it so this is our whole test setup and we've we've we've brought this up and we kind of tested how the flows are exchanged back and forth between the switch and the controller as part of this this is a small snapshot of our wire shot capture so once we bring up this whole test setup and the open the mule controller which has our open flow right library integrated in it it begins to exchange the open flow messages with the ovs switch as you can see here there are some hello messages that are sent in in the process of initiation of the open flow channels followed by the open flow features request and feature reply this is where the dp id and aux id are exchanged and the protocol actually determine the the two ends actually determine the version of the protocol to be used which is negotiated using these messages and we also have the echo request and reply messages these are kind of heartbeat messages between the switch and the controller to make sure that the channel open flow channels are up and running between them so that was a design and some information on what we do and how we do it what's next we hear a few tracks that we have thought of but we are open to more ideas there are some very obvious gaps we need to fill in tls ipv6 the that's the more obvious thing the greater o f awareness is really where we want to draw the boundary of how much open flow channel work that the library will absorb and there is a lot of scope there right now scaling to 256k we have designed for it we have not yet tested it so we are i'm sure we'll find some interesting problems along the way so that's very exciting personally for me the last one the benchmarking because we have put a lot into making this design put a lot of thought into making design of extensible to make it very stn specific and so on and this is where the rubber meets the road so i'm very curious to find out actually benchmark it and see how it performs against other more well-known implementations so to summarize we have very simple in that our api is a small set it's easy to extend we is easy to integrate we've shown you the steps it's literally one two three it's powerful because we think the thread library the thread model that we've used is pretty unique we haven't seen it being implemented in too many other places in fact i don't remember seeing that any other place so we think this is more suited to stn and that's where the power of this library is it is true open flow will will be implementing the open flow headers as it is defined in the spec easy to integrate goes with simple and it's a very well known open source license that it is published on so what what the motivations were for us to launch this project and get started on it was to actually be able to learn stn using the building blocks that we understand best which is network plumbing so we started with the very simple very well known the socket kind of connection built on top of it got sdn actually working on top of it and now we have we're actually playing with the sdn setup where we have our own code integrated into it which is quite a thrill for us and i think that is a very powerful way of learning a new technology and it has helped the core team the five of us who started this to to really get in get in knee deep with stn and enjoy it so if anybody else also shares that and wants to contribute or give ideas or anything we have a whole bunch of information up on our github wiki we have two repos one for the o f connect library itself and the second repo is that of the integrated mule controller which is also open sourced and feel free to email us drop a line at organizers at code chicks org and we'll be very happy to have you join our team this is a team that made it possible we are toasting to our first completion of the project on your right most is rupa the share who is the founder of code chicks next is swapna ire and kajal bhakar who work at Cisco and then this ramiya working at VMware and i work at Ericsson so we all came together our companies agreed to donate the code to code chicks and this is what we have at the end of it so we have a quick demo that we could show you before taking some questions which would take a couple of minutes so we'll be giving a very short demo on what we have done so far so basically i've set up wireshark so that it can capture packets on the loopback interface currently we are running the switch and the controller on the same host but they can be run remotely so this is the mule controller which has our library integrated in it so i'm going to run the controller now as you can see it does start it is waiting for new connect waiting to accept new connections from any switches and next i'll start mininet so we're having a minimal topology deployed in mininet with one switch and two hosts so you can see that there is a single switch connected to two hosts here and let's look at the wireshark captures now yeah so yeah basically we have our controller running on the local host and port 6633 and you can see that that continuously exchanging open flow messages now so the open flow hello messages is what will initiate the open flow connection followed by let me just stop the capture for a bit yeah the the open flow features request and replies where they negotiate which version of the protocol both of them should be running and they also exchange the dpid ox id of the main channel at this point so the the main open flow channel is fully established and you can see that they also have these heartbeat messages echo request and echo reply messages being constantly exchanged between them to in order to check the health of the open flow channels so that's where we are today but we plan to add support for more protocols like tls and ipv6 soon and we also plan to profile and benchmark our library with some other libraries out there thank you thank you that looks really interesting um can you talk a little bit about what happens if one of the callback functions blocks for some reason doesn't return immediately um how does the library handle that or what will be the effects of that do you mean the socket calls blocking um not so much the socket calls down into the linux side but the callback into the application to say we've received a packet for instance if it doesn't sort of just put the packet in a buffer but instead does something that takes a long time the packet that has to go up to the controller um we would basically have to debug it i guess but so we don't have any handling to specifically take care of you know time it and see if it's still hanging on or doing an exception handling so far what we have developed is basically for the good case so i think that's what we have to start looking at at the negative cases and that's a very good thanks for the idea i think we'll look into it it's an issue i found with the re-u handling which also uses a similar event model as if your okay event thread it blocks for some reason then a whole lot of things can end up being hung up in that right because your open flow messages are not going back and forth okay thank you have you have you thought at all um about ways to make this resilient or um um highly available so you've talked a little bit about scaling and how you can support a lot of switches one um one outstanding issue for a lot of controllers is how you actually take the controller frame we can make it so that it's um highly highly available if you um thought at all about how your library might um help enable that the high availability of the controller um we have had some early discussions on it but um yeah these are i think these are some great ideas we should follow up and we should probably have a discussion with you after this to get more ideas on it so yeah uh if i wanted to play with this um you know if you got a um i haven't looked at your wiki does it um give a um uh does it does it tell me how i can get started with playing with it yes absolutely we have actually a lot of instructions but also the on how to use github workflow that we have uh we recommend and the repository is also online so are you online maybe to show that maybe i think it's on my laptop just give me one second i'll quickly show you the this is what we have right so we have information on the sources the how to install it how to compile it our workflow how to commit a patch test setup on how to bring up your um so this is the test setup where you can um some instructions on how to get started testing with our mule controller um even before you so we have a lot of instruction online which you could use to get started my question is relating to the number of threads you got and particularly python your heavy emphasis on threads and implies to me that there's a lot of uh cpu activity you expect going on on each thread otherwise you wouldn't bother but python doesn't support threads very well does it um how does this interact with your model so could you repeat the very last question you had asked i got that python yeah the threading is you have you have a lot of threads there right um the impression i got that the cpu would be used by the callbacks from your threads so um but python doesn't support multi-threaded callbacks very well at all in fact at all it has this global interpreter lock i was just wondering does this mean in general for python usage it devolves down to one thread so we have python binding so we are you know what we are proposing is to have this library work as a you know keep it in c and then have a python binding that exposes just the api as a pluggable module to python so we want what's that did you have a point so we basically want to exploit the threading capabilities of c that's i think one of the reasons that we even use the programming language thank you one more question yeah i was actually quite um i was actually quite happy to see that it was written in c and had the other ones as bindings because what i found in doing a lot of the stuff in python particularly with um aru is that um you very quickly run out of steam the more open flow flow operations per second that you do so i mean having it having in c and being able to use threading in c would would be you know have a good vantage so yeah well done thanks thank you thank you i have a couple of quick questions so um while you were talking about your scaling test just wanted to know how your test setup was like how how are you testing that scalability the scaling um i think initially we'll want to use as much virtual environments as possible once we are you know once we get the basic bugs out then we want to be able to try out something essentially have a controller in a different host have the network of switches in a different and you know exercise that a little more but i think we will take baby steps we'll start with you know the least we can set up so we can get through our first set of bugs and then and just go through that okay um another quick question so while you were describing about your elastic pool right so you mentioned the blocks what what did each of those block mean is it a single connection between the controller and the switch or each of the blocks in the pool itself they they are thread um let me get the individual blocks right so they are one unit that manage one thread so the blue box is the thread itself but then the rest of the code is running in the context of the main library thread right so we have individual threads for for each of the socket handling we have a we have like a pool of threads but then the library itself runs in the context of the controller so that's where the thread manager and the pool thread utility are existing but we need all of all of this functioning to get one block of pool thread working i know you didn't do any performance measurements yet any metrics but what are your anticipation what are your expectations for performance levels in bandwidth or packets per second or whatever well i think we should we want to start with uh i don't have numbers at this moment but i think we want to start with what's already benchmark for floodlight and other controllers i think there's a lot of numbers that are already published and our first step would be to get there and you know see how we can exploit our own um design for that but sorry i don't have the numbers right now i have time for some more questions if anyone has questions so i'll repeat the question the question is about um integrating the open the o f connect with the switch side we have mostly talked about the controller that is true the api and all of the design is geared to that uh in fact there is some api the create channel that only the switch can really use because we are not expecting the controller to start a channel so we have designed for it but that's again a test effort that has to uh that has to you know be planned coming forward going forward just to point out that you know we are volunteering engineers and that's we do this on a spare time so we do the best we can do it given the resources we have so there is a lot of work to be done we admit but i think we're very proud of where we even got it too um and that it has some good foundation to start off on sorry that was another question i think do you require contributors to sign a contributor license agreement um so right now what we have encountered is we are all employed and we need our employment to release um you know intellectual property rights to your contribution so there will be some some level of that depending on what what kind of um you know day job you may have or if yourself employed or other things so there will be some kind of um but it'll go through code checks so once you contact the organizers i think we should be able to put it together i was wondering about your motivations for doing this work it's um you know there's uh there are some other um projects that uh uh there are other controllers and you know what what motivated you uh was it deficiencies in others or uh an eagerness to explore this area you know like a a creative um a desire to be creative there um i i think it started with the latter uh to explore sdn and to learn as much as we can but uh from the perspective that we understand um and as we were designing we were constantly thinking of how we can how we can make ours different and what the technologies out there are so that was part of the process certainly but our initial motivation was to just learn sdn just you know be students and put together something from scratch make it work and see how far it goes i think that's where it started so that has been quite a ride we have time for one more question okay um please thank our presenters