 Morning My name is Boris Astorjev and I work for a smart home Bulgaria the R&D department and today. I'm going to present you our Control playing software, which is a customized version of free BSD Well a few words about the company Smart home Bulgaria has been in the scene since 1991 and we have approximately 110 police at present We're split into three main departments integration microelectronics and research and development the integration Well, my colleagues there do deploy Certain appliances for different projects mainly juniper in finera Audiocoats etc. And the microelectronics does well they create or actually invent techniques for making chips mainly men's related and And the R&D department This where where I am where we are about 15 people and we try to create actually network appliances for Difference for different ice peas so that we can give them the ability to deliver The triple place service to their clients obviously data Voice and video so how do you start in the middle of 2007? We had Actually this picture here in Bulgaria This is how ice peas delivered Well their services and obviously as you may presume From time to time there were no services at all due to the weather conditions and due to the obviously Bad infrastructure, so we had to tackle this issue and Hence our first manageable switch. It has a protection against lightnings and we have certified that and It is called smart switch pro 800 a motor or CPU Based on real tech it has eight to 100 max. Ethernet copper ports and it's managed by a a GUI so we wanted Hmm, so we wanted to push things further. Okay, everything's okay, and Here's our next family of switches called a GS V1 Come on. Sorry Yes, I have Okay, good day The low battery Low battery, sorry, it doesn't work. I think right now So but you basically can hear me. So I'll continue and Stream well that's a pity so This is how we wanted to push Things further. Hence we created our second Family of switches called a GS V1. They were arm 9 With arm 9 CPU based on Marvell chipsets and they either had it either they either have 24 or 8 100 max. Ethernet called for port or To one gig port they were Linux based and and Actually the main focus was such that we can They were able to deliver triple place service, but in the middle of 2010 we had plenty of issues actually finding Components and parts for manufacturing these switches and Can we disable it So in the meanwhile we had a new Well, I disabled it but obviously the problem is there to in my opinion well The new requirement was such that Actually whether we could create a layer 3 hardware switch obviously a router and we started contacting several vendors for instance Broadcom MediaTek Formally relink I think Real tech as well and Marvell and well only Marvell gave us a chance we opened Actually, we have signed an NDA so that They could handle us their Documentation and they propose this to system on chips Obviously the great thing about them was that This is how we could address the customer's requirements We could redesign as we one and the great thing To me in my opinion was such that they had I they actually have identical registers so that we could deploy single system and just Trim some of the features and actually have two different switches So so the new appliances edges are luxury distribution switch and edges we to an axis switch Substituting edges we want they were all well the switches for mention They were all designed from the ground up in smart combo Gary both hardware and software So a few words about ages ours hardware. It is a Marvell system on chip Rv5 CPU with a single core one actually 800 millihertz clock speed 512 megabytes of memory and 512 USB flash disk it has a modular architecture Hotwrap architecture 24 1 gig ports and up to four 10 gig ports so the A few words about the Lair two features and Lair three features as well Well, we have we support up to 16,000 neck addresses in hardware We support jumbo friends the last obviously stacking there three there are 30,000 t-com entries for routing for thousand which are for arps We obviously support ACL based routing and ACLs QoS IP multicast on controls and what's more A few words about HSV twos hardware. Well, it is It actually has a similar system on chip in it We put a little less memory there 128 And a few words about the interfaces 24 not speed SPRJ 45 ports and Four combo one gig combo ports. So this is actually trim version of the edges are here. So It is the brand name Tom for edges are here. Here is the Here's the edges are Ross that would be Ross and our CP Again, the main focus was such that we deliver the ability to our customers so that they can deliver the triple place service to their customers and So we enough with the introduction The software choice way we choose actually free BSD. Why of course due to the BSD license it's more commercial friendly and Well The Morvelle system on chips. Well, they had support at that time In the eighth branch and this is what we use and Obviously, it's not that easy to jump to a newer version, but I will say a few words about it later we were very inspired by net graph at that time and To me without starting a flame war here. Well free BSD has the biggest BSD community and this is great. So Net BSD had support for our chips as well, but well, they lacked Net graph in the main line and this was an argument to us Open BSD did not support our chip. So the argument actually in the decision was quite political and We had to start immediately so that we can address the deadline Which is a couple of months. So we ended up finally with seven or six months I think but I'll skip that. I was not a step any further. So where do we start from? Well, to be honest in terms of chronology, well Hardware and software Well, they went in parallel. It's not just okay. Here's the hardware The guys in the hardware team had to actually design it and so forth so we ordered a demonstration board so that we can start from somewhere with our software, obviously and And now this Will take that we have the hardware for granted. So we use a U-boot So that we can initialize it initially We use its API so that we can connect it to the free BSD loader So the main idea is so that We can export to the free BSD loader of the Callback so that we can read from our USB disk. Hence we can read our kernel and You use the U-boot loader for that great project Now that we Could read Actually the kernel we started reading images, but obviously they were actually borked why because we had to tweak a little a little bit the API and Hence I wrote a simple feature in the loader so that I can actually Calculate the checksum of the image that I have read and To be sure that I have read it without erroneous bits so that on booting. Yeah, I would not experience any Let's say crushes and hangs so let's move the kernel and say a few Words about its design first of all, let's split it between hardware and software here. We have the Marvell Mac and all of its properties and features and Ports obviously physical ports in kernel space We have a CPU port That represents and this is the place where we receive the intercepted packets When we control plane we need to intercept packets so that we can actually Use some pieces of information of them so that we can configure features of the hardware then we have written a Hardware library. Well, obviously we were given such a library, but the license was not quite Good and we had to write it our own. So it's a kernel object basically with tons of interfaces in it and Methods so that it's my idea such that we can control and tweak the features of the hardware here Then well, we have 28 physical ports Well have it in mind like 28 ethernet cards and now we wanted to create a Logical representation in kernel for that and that would be this part here. So we actually created 28 ports that that can easily be viewed by for instance typing I have config and this is our objects that actually represent The physical part that is down there so In user space we use plenty of user land demons tools facility, etc. They communicate with the kernel of course through various interfaces. I Did split I have config on purpose because this was our initial idea so that this was the the main Configuration facility where we had to start from something. We obviously couldn't Well say, okay. Here's the CLI we in the beginning Didn't have a CLI so The I have config facility was its main idea was such that we could control basic features of the hardware So the network stack This is our proposal to the network stack. Well, we have plenty of ports as I said they can be either member of often aggregations or They can actually be assigned a Villain trunk obviously here the unit each unit represent represents a single villain for that port and Well, they have plenty of properties for instance PB it is It is generic here and some of them inherit actually Properties of the Interface that is below them so For instance, let's say that I want to bridge To Interfaces for for instance a port in a villain in One and the same bridge actually so sometimes in control plane you have to do that you have to use some somehow unite them Hence here's the bridge part It has obviously again the property of a villain and it has to be the same with its unit members then We have an interface called interface not quite wise name, but never mind it serves as a D multiplexer for different families For instance On top of it. It has a sub interface here each sub interface for instance carry a Property of a family IPv4 IPv6 MPOS, etc. This is the main idea and here's the router part It operates on these layer three actually interfaces I must admit that we were very inspired by net graph for this whole infrastructure and Well We actually ended up creating our own Infrastructure so that we can actually manage the features of the switch so I'll give you an example here about the single Relation here. Let's say I have a port if net if structure it gets a Pointer a evil entrant here. That's no which means basically that we I don't have a unit on top of it. Well The as just if like here, it's another pointer and it point to a subsea of an aggregation and this is how we This is how we actually say that Well, there is an aggregation on top of us and that this port is actually a member of a Aggregation then in order to well, we know I think that you know the if input pointer here well, it is usually assigned the ethering pot function and this quite busy function and We wanted to skip that so that we can gain certain amount of optimization and We actually assign a Callback procedure from the module that's on top of us for in our case here the Like module assigns a input procedure for its parent one So as I said, we gained a certain amount of optimization, but I can't give you performance results, which is bad. So How do we traverse actually the network stack here? Basically, we received an interrupt here. We fetched the frame then we end up in the CPU since actually call with the mbuff now now we know the Source port the source physical port and the source villain so we can send it to our module here then we check the glues actually and If there are no no we can pass further and this is how we actually traversed instead of calling other input that that's quite busy that we'll Check all of these glues in the same time The Agrosol well sooner later the other output function if transmit park where we are handoff a And in the if start we are in queue with a Mbuff then we know where if start procedure in in our module we Dequeen it and hand it to our CPU port module and its job is to Actually compile the frame so that it can address the device So a few words about the unicast router. Well There are initial Obstacles regarding the hardware here. I mean the t-cam updates. Well, they're quite tricky From time to time you must sustain there the longest prefix match so that it is completely consistent with the free BSD forwarding table and Well, sometimes you in order to write an entry there you may need to fetch a certain block populate your entry there and then You must write the whole block if you don't do that it will you will end up with inconsistencies and this actually will result in Let's say software routing which is bad for a 800 mHz CPU so How do we actually populate our hardware? Well We need to intercept traffic for Some time. I mean we need to trigger arps When two directly connected cost one to view each other from different ports and in different of the last one up they We need to actually ifs drop this Communication with the ARPs so we have placed a hook in in our input Well, in order to actually intercept traffic we need to Actually Intervene with the routing messages system So that we can update network prefixes in the t-cam so that they Sooner or later will give us Some of the traffic so that we can actually Control t-cam and we have placed a hook into RTD spatch so that When there is a bundle of routing messages we can immediately ah Go and populate the t-cam the multicast router. Well, we use FreebSD's implementation options and routing to me it kind of works Well in our application We need again to intercept multicast data traffic in CPU so that we actually know the source Source IPs of certain multicast streams and This is how we trigger MFC updates and upcos to us so that we we can populate the cache there We have placed hooks here and here and this is how This is how we can try the t-cam activity Did a dysfunction? Well, the hardware gives us this ability and we need it because from time to time when we are for instance a first router and Arouse the rendezvous point. Well, we are obviously Flooded with tons of multicast streams and we okay We intercept traffic for a second and then we insert the for instance a drop rule so that And track the the t-cam activities so that we can sustain the cache there Well the MFC upcos are handled by a Demon in user space and its main job is to actually populate the cache In kernel Few words about useful tools and implementation. Well, we we use plenty of facilities here Bpf for intercepting packets Row in role mode call out for repetitive action event handlers for thinking different proper Properties synchronously ioctl screen object locks for making things atomic Well socket sys control sys code etc. And in user space we have in the beginning used to lots of All can set operations here on I have config route sh and Well, we have we wrote actually a tool that is able to read the certain Well to read all the registers of the hardware so that we can track down bugs and misconfigured features Then a few words about the layer two features actually there I've tried to generalize them here in terms of control plane Well first part that they're mainly interface property related I will enumerate some Vlan tagging kung-ku out of learning link transitions dampening static MAC addresses. They they look very different But in the same time they're just properties of certain objects in our network stack and the main idea is Simple ideas. Okay. We want to create a Vlan We issue an eye Ioctl Create a Vlan and it creates an object that represents the the physical the physical part then we The Vlan interface is responsible for contacting our hardware library so that we can program the controller Okay, another example we want to intercept IGNP packets good then Say it to the Vlan and its job is to contact again the library and this is how we will interpret IGNP packets In control plane actually intercept The the second category packet interception or ants LACP RSTP IGNP snooping where we have to process certain group memberships DHP snooping where we need to track certain states insert option 82 And hopefully and obviously a great feature here We are able to insert an allow rule In the ACL so that only the SP authorized the persons may have internet access Well another example now that I can intercept IGNP packets, I can create a demon IGNP D its main idea is to intercept IGNP packets for slot 3 port 1 Vlan 10 this Actually, I'll say pair and then as soon as I receive certain Membership requests I can issue an STL so that I can program the hardware so that it will duplicate Traffic instead of doing it in software The last three features again mainly packet interception oriented the unicast routing interview on multicast routing policy-based routing Well, SMP we can easily use BSNMP D And take it for granted PIM sparse mode BGPD. We can use open BGPD. We can use quagga as well Of course, it's a little bit tricky because we had to tweak it with the routing messages and Our infrastructure The speed relay will obviously learn three feature. I will not step any further and another example here I have I have a PIM demon. I set some options so that I cannot the bundle of interfaces in IPM route then I Programmed the hardware through the library. I intercept PIM and IGNP packets and multicast frames and For the configured interfaces sooner or later IP input will be called so that it will end up in MFC cache mesh miss upcalls and this is how we will populate the Cache An example here about non packet interception oriented features. Well routing preferences freebzd wouldn't give us A forwarding table that is aware of routing preferences and we obviously have to write a demon so that it can it could handle Different actually identical routes with different preferences and Choose the Appropriate one So the quality of service we have plenty of rate limit or storm controls Q's progress etc. But in terms of control plane we had to make our CPU such that we Could split the management traffic and the intercepted traffic in different queues because if we don't do that we may end up with Actually The case That we would be unable to actually manage our switch and which and this is bad Then now that we have packed all of this stuff How do we upgrade it? How do we hand it to our customers? Well We use a modified version of nano BSD to meet great project it gives redundancy We made it to have four slices that we use us for that We have two route far route file systems Well One of the two route file systems are is active We have a conflict slice for holding configurations and we have a miscellaneous slice for testing purposes for instance. Well The sometimes it's not quite actually Good to upgrade a whole nano BSD image and because it's slow and Well, it will not Actually spoil the service, but it's slow and sometimes it's not necessary. So I've ported the port collection to our needs So that it's a pretty customized one and it is so that it is focused on partial upgrades and It may cross build certain facilities of our software Well Obviously there will be little little or no service disruption. Well, we we upgrade the kernel We will have service disruption But if I want to update a certain tool that is able to read the certain counters and I tend to update its source Quite frequently. Well, maybe handing a single package to the customer to the customer is obviously the right Approach here. So as I said, it's convenient for partial upgrades now a few words about the CLI way nowadays we have a CLI it's based on Clash and it's a Cisco like CLI it's hierarchical we use Luan and shell scripting In it and we use SQLite 3 the interesting here is that it's commit oriented instead of entering and shoot has the Juniper CLI as the Juniper's approach and now it's the desired way for configuring the device instead of just issuing IF config shell scripts and Etc. Etc. A few words about the developing issues. Well, the arm debugging and both in kernel space and user space Well in kernel space we may Use a JTAG and we can trace the kernel at some point But in user space, it's quite hard in the 8th branch So we have to cope with that. We of course obviously do crushing inspections We use classic dumps to a swap partition or we use net dump great project Well back traces and traces are hard as well But to me the toughest part is that sometimes I have to import certain patches and use them from freebies and that's hard because The 8th branch is lying behind 9 and 10 So the toughest part is that it's hard that we track latest versions of freeBSD A few words about the quality assurance for all this stuff. Well, of course we do plenty of black box testing in the well Clueless partitioning boundary value analysis straight we stress test our Software exploratory test the interability interoperability tests with Juniper and Cisco four-stand extremes Of course testing in real apology and we automate that Hence we have regression tests through the CLI and through the SNMP protocol and we use TCL for that A few words about the future development here. Obviously we have to work on IPv6 and we support now only IPv4 but I Think it's quite okay for now well we maybe need to focus on Supporting VRFs the hardware Gives us this ability, but we have to enhance the software well stacking. Maybe it's a good idea to make the switch or actually bundle switches act as a single one and Well So that they can easily be configured via a single CLI Of course, we have to optimize the code redesign re-implement all those bloated parts. I Think I end here. So thank you. I may may I do I have time? Yes Use some demonstration here So here here you are plenty of actually interfaces Okay, we have port 3 1 which is a which is a tangent port. It is a member of an aggregation Here is our CLI plenty of Configurations here we have lots of villains Well, what else I think that currently There is a TV connected to this switch So I will check It's Memberships well, I have this group now and it is delivered to a listener in Villain 599 I Think that's Sky Sports HD. I don't know Well, I'll stop here there's plenty of more Demonstration I can make but I Will leave some time for questions Do you understand reddit? large part of the traffic flows In the hardware without entering the free busy kernel. Yes, but can you monitor it? in the runtime the Amount of traffic the biggest I can I can okay. I can actually Run a TCP them on our CPU port and I can easily view the number of backends per seconds for instance and To speed up. Yes No, no, no, no, it will only show you The traffic that is to be intercepted I mean the traffic that is designated to the control plane the hardware traffic. No, I mean The hardware router will or for instance there to switching will not end up in the CPU You will only see the control plane traffic, of course Well here here it is. I have plenty of counters. They are exported via the SNMP protocol. We can Plot tons of MRTG and already two stuff there and We can clear that There's that they start thinking again Give me the dead microphone that my voice won't go through. I will pretend to speak into the microphone So two questions one is are the chips you're using still supported under free BSD 10 or only under 8? I think that they are supported in 10. Okay, so there's no Reason you can't go to 10 other than the fact you have a legacy code Well, there is a reason which is The reason is such that well, we have tested the 8 branch extensively and We haven't encountered crashes and Certain stuff there if we jump to another branch We must Test it again extensively so that we Discover undiscovered the books there Okay, so it's a test. It's kind of risky. We we may Focus on doing that Well, we may need to make our actually software as a module so that It is detached from free BSD to some amount, but it's kind of tricky Okay, and the hardware library is there a well-defined API Or is it just DMA commands to control the hardware? Oh, well We use the kernel object infrastructure where we create an m file a m file and then there are you Create actually describe the methods and this is what we use Okay, and is any part of this Committable back to free BSD so well, I have given some well, let's say parts or bugs but well Not not the whole the whole stuff Not the whole thing. Yes somebody else Anybody well, let's thank our speaker then