 Okay, hello everyone. Thanks for attending this talk about highly available wide area networks on OpenBSD. I will start with a few words about me. My name is Marko Tsupac. I live in Belgrade, Serbia. I work as a lead system administrator at Kappa Star LTD. You can read more about the company I work at at the following link. I'm taking care of around 30 node wide area network which is completely based on OpenBSD routers. I also care about Juniper switches which are on lines of these sites. Additionally, we are also in Kappa Star self hosting all essential internet services by means of free BSD jails like DNS, web database, email, LDAP, instant messaging and so on. Here are some general guidelines about highly available ones. As I said, this network is tested in my company. I implemented it there some three years ago and I am presenting it to you now. So we need hosts on all current 30 plus sites to communicate. This solution has to be secure and confidential. No data sniffing by third parties while in transit and also we don't want to have any unwanted traffic. This solution has to be scalable. We need to be able to add more sites reasonably easy. We should be able to change ESPs as we please. This solution has to be manageable. We don't want to be doing constant reconfiguration, troubleshooting and caring too much about maintenance and management of this network. Also solution has to work without interruptions. We don't want to be constantly answering our users when we'll internet be back again. Telling ESPs site access is down. We have no idea if it's up to us or you. Can you please check? Hearing from ESPs usually lie. I don't know how our other ESPs but here they mostly say yeah we checked everything is fine on our side and then have technician drive to the site to see if our router is down and it's usually not. It's usually that our router is up and ESPs router is down. We don't want to be postponing OS upgrades on routers or doing it at some non peak hours and also we don't want to worry what would happen if ESP doesn't fix important link in short or longer time like X minutes hours days. So on my network I opted for hub and spoke topology and I implemented the Carp and PFCNC at hub site. As you see we have four GRE tunnels for each spoke. They are protected by transport mode IPsec and the key in the mechanism is ISAC MPD. I know IPD is newer and there are also even more newer things like never mind. But okay this is proven it works and it's mature technology so for me ISAC MPD is just fine. So we have two tunnels per ESP one goes to Carp Master and one to Carp backup as we said four in total. We are routing by means of OSPF over GRE tunnels. Hub redistributes default route. GRE interfaces on hub depend on Carp so active routes go through Carp Master and if Carp status change we have seamless failover to another Carp member. Seamless means that no sessions should be interrupted all downloads should continue or video audio streams whatever. And also spokes failover between their ESPs you see primary link over over blue ESP and then green links over green ESP which means that when primary ESP goes down secondary seamlessly takeover. I was also considering the possibility to load balance between ESPs from spokes but so far I don't have two links of good quality on on the spokes side so for now it is failover maybe it will be load balancing somewhere in the future. Okay so this is standard operation now we have designated Carp Master which acts as a hub or center of the star and spokes communicate over primary ESP. Here we see what happens if designated Carp Master goes down be it crash or maintenance so designated Carp backup becomes master. Spokes seamlessly fail over to new Carp Master all stateful sessions survive and spokes still communicate over primary ESP. Now if we see what happens about failover on the spokes side so let's say you see here on the on the third spoke blue link for some reason went down and traffic fails over seamlessly to backup ESP and also all stateful sessions survive. I only here depicted five of the spokes actually there's around 30 of them at this moment and the number is still growing. So it all looks really nice in logical topology but let's not forget that these two hub routers are also accessible over some ESP and we don't want them to be accessible over just one ESP. So from now on I will not call those two routers those are those two in the in the upper corner of the slide I will call them nut routers. I call them nut routers because they nut for head quarters long and we need to introduce another Carp pair that's the pair of routers in the in the center of the of the slide and we will call these BGP routers because they talk BGP to two upstream ESPs. Each BGP router is connected to both ESPs over slash 29 network we need that's a total six addresses we need one address for physical address of one BGP router the other for physical address of the second BGP router third for Carp address and fourth of course for ESP routers. For this setup we bought our own autonomous system and slash 24 public IPv4 network. We announced this autonomous system and public IPv4 network to upstream ISPs over BGP and this is the one of the most important parts of this setup is that we instruct BGP peers to send us traffic over Carp interface instead of physical interface which allows us to seamlessly failover. So we see here if one of our BGP routers goes down let's say the designated BGP Carp master routes between DMZ and internet it's interesting that these BGP routers have no default route best route to each internet prefix is determined by BGP and some traffic goes through ESP one other through ESP two. Here we see BGP Carp failover so designated BGP Carp master goes down with crash or maintenance designated BGP Carp backup becomes master both ESPs seamlessly continue sending traffic to new Carp master and all stateful sessions survive which means no interrupted downloads no interrupted video audio conferences ice cast streams whatever it should be completely unnoticed from any users on our network. Not only can we survive the crash or reboot of one of our BGP routers we also can survive loss of any of the two ESPs in our headquarters. So as you see ESP one went down no problem because ESP two is still up so DMZ is still accessible from the internet and the internet is accessible from DMZ and also our headquarters LAN can access DMZ and internet by means of NAT. I see you are asking some questions thank you I will answer them after I finish presentation. So having our own public IP space and routing BGP has a lot of benefits not only the possibility to have this nice highly available network setup here are some of the benefits so DMZ stays reachable even if one of the two ESPs go down we have our public slash 24 pool of IP addresses and it is our own piece of the internet we can change ESPs as we like without changing our public IP addresses as long as they give us slash 29 and talk BGP to us for people not really familiar with BGP and and autonomous systems you can think of it as obtaining your own domain name so if I have my domain mimar.rs I can move it to whatever providers I want just by changing DNS records so it's like this we have our own public slash 24 we can announce it to upstream BGP providers from wherever we want as long as they give us slash 29 and talk BGP to us we can offer from network point of view highly available public internet services from our DMZ on our own IP address space that's first and foremost DNS servers because the reverse zone is required for BGP but also web database I am mail VPN whatever so highly available services on our own IP address space which means they should be always on and at some point some ESP can't say just I am changing IP addresses so reconfigure all your infrastructure or whatever okay that was a theoretical concept and now we will be moving to to exact config files I will start with the configuration from a BGP pair of routers those external routers that connect us to upstream ESPs on the hub location I usually start with the sys controls because I forgot them so many times like I typed really complicated a setup and in the end something won't work because I forgot two simple sys controls so here it goes it's the same on on both routers we just need to enable IP forwarding in order to turn these open bsd boxes into routers and because we will be using carp common address redundancy protocol on these routers we need to also enable carp preempt as you see it's the same on both routers we will be now configuring physical interfaces I tried to call or code somehow these texts to be more readable and I also highlighted the changes in my particular case those two firewalls are actually HP ProLiance 360s which come with BGE interfaces so on BGE0 interface on both routers we use it for for pf sync physical interface and it's usually those routers are usually connected with cross cable in order to to achieve maximum availability of this interface we put this on some slash 30 private space so you see the difference is only in last octet of IP address then we have a BGE1 interface which is the interface which connects to DMZ and I used here RFC IP address used for documentation and examples but this is in real in real life actually public IP that belongs to our own public IPv4 address space that is tied to our autonomous system I like to choose 252 and 253 which is not the last address but the ones close to last one because I see this says okay traffic is leaving our network through these addresses so they are the last last point before data leaves you will see later that dot 24 will actually be the shared address of carp interface we give it description so we we know better what's going on as for BGE2 this is a interface that connects us to ESP1 and we are basically getting these IP addresses from from our ESP they are they can be any public IPs as long as they are slash 29 as I said we need three addresses for this setup for us which is as you see dot 4 for physical interface of first carp member dot 5 for physical interface of second BGP router and then there will be dot 6 for four shared carp interface and similar interface BGE3 connects us to second ESP we also here have dot 4 and dot 5 and we will have a dot 6 for shared carp interface and you will notice that we also assign those interfaces to group internet this will be thanks to this group we will be making our pf rule set more readable because we will be able to just block something on internet or pass something on the internet or match something on internet we want need to write two separate rule sets for for two different ESPs okay as for carp interfaces I will put them with dotted lines so they are a bit different from physical and I also highlighted with with gray highlighter difference between BGP1 and BGP2 and you see that they are almost identical except for the value of ADVS queue which dictates which of the two carp members is going to be designated master and which will be backup the one with higher ADVS queue will be backup also you see the difference that we have VHID1 they need to match between carp interface pairs and they should be different between different carp pairs especially if they are on the same network so we make carp interface for DMZ with shared IP address the same for group of interfaces connecting us to ESP1 and to ESP2 and additionally we have a logical pf sync interface which is tied to physical interface BG0 the one that we connected those two boxes with crossover cable and over this interface we will be synchronizing pf state tables which will enable seamless failover and preservation of all states in pf okay as for BGP it's controlled by BGPD conf well we see first that we have three macros in the early in the rule set like our autonomous system number IP addresses of our two ESPs and then we announce our ASN we set a different router ID for each BGP carp member we define our prefix set my networks this is imaginary IP address it would be actually some public IP slash 24 we create a group upstreams where one neighbor is ESP1 which we defined earlier their remote autonomous system description and that we communicate to them from our local address which is physical address of interface connected to them the same for for second neighbor and at the end of this rule set comes the the most important part where we instruct our ESPs for them to set next hop to our network through carp interface and not physical interface this way when they are actually not sending traffic to any of the of the car a BGP carp members their physical interface they are sending it to carp so primary goes down the second one just goes down one would maybe wonder why are we not doing this directly from carp interface or why is BGP does not have some mechanism to depend on carp interface well the reality is it doesn't it's a BGP protocol is like that you need to have have it always running and have the knowledge of complete topology so this is how it's done or maybe it could be done other way but this is how I done it and how it works for me pay attention that most of these config files are not complete rule sets they will of course not work just by copy pasting them and sometimes there are some values to be changed but sometimes they also contain pseudo code or simplified stuff but I'm sure they will be just fine to to demonstrate the concept okay let's go back for a second that's all believe it or not for a BGP pair of routers we configured just a few config files and that's it you should try testing with with some TCP transfers from DMZ to the internet and from internet to DMZ and start pulling out cables from some of the boxes reboot one carp member reboot other carp member pull out cable from ESP1 then put it back then ESP2 it should all work and neither of the TCP sessions should break make sure to test it before you put it into production but yes it's that easy it's just a few config files we will be now moving to our pair of nut routers those are the inner ones the ones that will act as a as a center of the star topology that that spokes connect to over over GRE tunnels are hub routers so we also start with the sys control sys controls this this sys control conf is have a few more lines besides enabling ip forwarding and car preempt we need also to enable multi path because this router will have multiple paths to each of the spokes and because GRE tunnels will be terminating on on them we need to also allow GRE so the sys CTL conf is identical on both nut routers okay we are now configuring physical interfaces we also start with pf sync interface which is just local to these two routers ip address space but we don't want to make any conflict anyway so we put it on some other slash 30 then we create physical interfaces that connect to DMZ and we see that we put them on dot 2 and dot 3 dot 1 will be actually shared carp interface and also we have LAN interface for our headquarters it's on private ip address space and i put it on slash 29 not because my hq1 is so little it's because on the other side of this link is actually l3 switch so this is just transit network to hold 10 60 00 slash 21 network so as you see i highlighted differences between those two routers there's only difference in last octet of ip address it's good idea to to keep these these config files as similar as possible because later on you can easily template them all right now we go to carp interfaces as i described earlier they are identical except for higher value of adv skew on on designated the carp backup and vhid value should be different from any other on this side so we put them on 4 and 5 and here you will also see that i added some aliases on to carp DMZ interface and that's because we will be redirecting some traffic to the to hq1 and it's easier if we don't need to put it on some high custom ports but to have one-on-one mapping of ip address to to inner host in the end we have a whole slash 24 of public ip addresses so it's not a problem to allocate allocate for 10 or 20 of them for internal redirections as for pf sync we just instructed that that it will work over a physical interface bg0 okay isak mpd conf i'm setting this to to ip because isak mpd tends to bind to all the available ip's on the host and because these nut routers will have hundreds of ip addresses on it thanks to so many g re tunnels and the carp interfaces and whatever so i like to instruct this again pd listen only on dmz physical interface on both carp members of nut carp cluster uh next thing is ip sec conf you will notice that i haven't put stuff directly in ip sec conf i just included some files in ip sec conf d which i created myself and then i put the actual ip sec conf files into this directory the reason is not only better readability of these files the reasons is also if you want to flush some ip sec rules you usually do that by of course i ip sec ctl uh big f which flush is everything but you also have the ability to to flush rules contained in some config files with with d switch and to point it to some config file so this way i can just flush or load rules from uh some spoke and not all of them together this is template for ip sec conf for uh some spoke so i'm doing the the transport transport mode so from uh nut one dmz address to some spoke esp one i put some authentication and cypers maybe they are not the best one they work for me if you have any advices i would like to hear it and we see that we have one one rule set to two esp one on some spoke and then the the same one to two esp two on same spoke and then we need also to do that on uh nut two firewall but we here do it from physical interface of nut two dmz to esp one and esp two so i will show it how it looks for uh spoke one and then we will have it the same for spoke two and as i said i have uh at this moment around 30 spokes but they are all done with the same logic and that's it uh actually nut is almost done at this point we still don't have our spokes connected to to nut routers and our inter uh location traffic is not working but at this point we have working highly available nut pair as from point of view of rooting for a local LAN which means that we can test reboot one of two CARP members to see if uh if our internet traffic from HQ LAN will survive and it should okay now we will start to configure spoke routers uh i am starting also with sys controls we need ip forwarding ip multipath and uh to allow g re uh someone asked here about w ccp i really am not sure i enable it maybe it can be disabled uh this works for me i will give it a bit more of a of a read i i think i saw that in month page so i put it there okay so as for LAN interfaces uh we have of course some private address space you see that this is uh slash 29 as i said on all the on all the spoke locations i have a layer 3 switch so this is only transit slash 29 network for bigger slash 24 network with bunch of VLANs as you see this is a template and it only differs in third octet it is very important to plan ip addressing scheme of of such a big networks otherwise your config files can be real mess it's good to have them uniform so that you always know ah 64 i know that's uh i don't know uh hungary or dot 65 in third octet uh yes okay that's uh novice here in belgrade and so on okay this is uh interesting part unfortunately my line breaks got garbled here in conversion but i hope uh it will be clear okay so these are the interfaces on our spokes that are connected to our primary esp you will notice that they are put in our domain one the reason for this is that you can have a g re tunnels which terminate on same peer and it would work but when you mix it with ip sec it doesn't work so we need to put each of the esp into separate uh routing domain we got this ip address from our esp it's standard slash 30 point to point link we gave it a meaningful description and also we are assigning default route over next hop interface and you see that we have slash t1 which means we are putting it into uh routing domain one we also need to uh start is a kmpd demon in routing domain one in order for this to work and uh for this reason we created uh we created the special file is a kmpd conf dot one and we will also have is a kmpd dot conf dot two uh each for each uh routing domain there will be no is a kmpd conf without dot number on on spoke routers uh so we also need to load some ip sec control rules in particular routing domain one and load them from ip sec conf one and let's not forget that if we want to be able to ssh to this uh interface from the internet we also need to have a ssh uh d started in uh also routing domain one for which we created the separate uh config file so you see that uh this is actually a template uh the only thing that differs on all spokes are the ip addresses net masks and the default gateways we obtained from our esp everything else is the same uh i guess i need to speed this up a bit we come to esp two interfaces which is uh almost the same except for the fact that everything here is done is routing domain two and not routing domain one so we have uh is a kmpd conf dot two we have ip sec conf dot two and we have ssh d config dot two we must not forget to create uh ank interfaces in their respective routing domains otherwise nothing would be encrypted and we wouldn't be able to encrypt our tunnels or ssh to these interfaces as for is a kmpd conf dot one and dot two we just instruct is a kmpd to listen on their respective interface uh respective ip addresses and as for ip sec conf unfortunately also line break here is a little garbled so they are self-explanatory for anyone that uh that has ever configured the ip sec uh the same for ip sec conf two okay we also do three different ssh d config file uh the the default ssh d config file which gets started from rc we put uh line address and then for routing domain one we put tsp one address and for routing domain two we put tsp two address and here comes uh final few slides which are almost uh most important for for this these are g re tunnels themselves uh they should also be self-explanatory as we said uh there will be uh four per each spoke so it's uh my private address his private ip address uh my public address his ip address and we repeat it for a number of uh of spokes you will notice here and this is very very important that you you should not or i haven't uh named my g re interfaces like g re one two three four because once you have them uh 30 times four which is 120 it really becomes uh unmanageable instead what i do is uh that all the g re interfaces which uh end on first tsp of of spoke start with one as you see with all of these uh all of these interfaces and if they terminate on not one they end with one and if they terminate on not two they end with two so actually the middle two digits are significance like zero one tunnel which goes from not one to esp one and on esp on not two it's uh actually zero one uh g re tunnel tunnel that goes over esp one but terminates on on not two card member it's really a lot easier if you if you number your g re tunnels like this so as you see uh these are the one that go over spoke csp two and you will notice they all start with two it's not actually 2011 g re interface that's first uh that's first g re tunnel that goes over spoke csp two on spoke routers of course we just reverse things we call it the same g re interface and we now said okay my private ip address his private ip address and tunnel from my public ip address to his ip address and this is very interesting we need to put those tunnels into tunnel domain one otherwise they wouldn't work because we put this physical interface in our domain one remember okay so over esp two it's the same it's just that uh you need to put it into tunnel domain two okay i'm finishing i have just a few more slides as for us pfd conf we are putting different metric and uh anyone who takes a closer look to these slides will notice i won't describe anymore because we are running short of time so esp fd conf on on other spokes finally rc conf local we see that on bgp one we just need to start bgp d on hub nut routers we start ip sec is a cam pd and os p f d and on spokes we just start os p f d from rc conf local because remember we started all the other demons that we need like is a cam pd and ip sec uh from uh from hostname if files by means of scripts finally we need to exchange uh keys uh put them into uh you see where uh and nut one and nut two will need two pub keys for each spoke csp spokes will have two of the same pub keys in total one for nut one and one for nut two or perhaps you can use pre-shared keys in ip sec conf but perhaps better not pf we don't have time to talk about pf we don't have time to talk about setup without pf but i will give just a few general guidelines pf conf should be identical on both carp members you should increase state limits use interface groups and interface macros use interface modifiers like colon network colon peer colon zero do not sync unneeded states so traffic terminating on me are no sync and traffic passing through me sync block everything pass what's needed as for spoke routers pf rulesets should be identical for all spokes i mean rulesets of course not macros uh you say macro names and change their values per spoke block everything pass what's needed permit inbound ssh from anywhere to both tsp interfaces for troubleshooting purposes if tunnels are done for some reason only udp 500 which is is a kmp and proto esp 2 and from hq need to be passed on both tsp interfaces quite a lot of rules need to be if bound and keep spokes ruleset minimal do all the filtering on hub and that's all folks thank you for listening if you have any further questions regarding this setup or even want me to type it for you need advice on network architecture need help with running services in free bsd jails like dns database email web whatever or just want to hang around talk about music retro gaming or whatever you can contact me on the following contacts thank you very much thank you very much marco so you have a couple of questions in the share notes there are a lot of them let's try to answer a couple of them in the in the next three or four minutes until this uh this recording will stop so please go to the share notes and try to pass through the questions okay and the remaining one you will be taking in the halloween talks okay maybe in the lunch okay so what made me pick the double star topology actually when i first came as a young cis admin net admin to company i work for now there were just three one links and they were over frame relay with i don't know some rip routing or whatever and they were running on some cisco 800 routers so actually this network is evolving for last 13 years and i'm the one who is testing trying out things i this is of course not the first iteration i was tunneling ip sec in tunnel mode i was trying a lot of different things and i came to this solution at the moment who knows what will be in in few years but for now i picked this because it gives me what i need which is peace of mind i like sleep well no one calls me that that things don't work perfect go ahead at the next questions okay are the gary tunnels over the one on your ospf backbone area yes everything is backbone area zero you will see that in uh in ospf config files so go ahead and answer all the questions without interrupting you so okay why did i decide on carp instead of dynamic routing on the hubs i don't really understand that question perhaps we can discuss it in the hallway does failover go back to the primary isp yes it does after sometimes and it's all seamless it it really works really good do i track availability we are ripe and how many nines do i typically get no i haven't yet tracked availability we are ripe i'm actually tracking availability of all my all my sites from central location we are nagios just by pinging them each 10 minutes and then i do then i do monthly report and it all looks really good but i would be interested interested about getting to know how can i track availability we are ripe do i delay forwarding new states until pf sync has synced well i you saw in my config files what i do i don't really know but the carp and all the components of this setup are really smart like nothing will kick in before it's ready to kick in i never got the the situation where carp master took over the master role and then something was late and not yet ready to do it seems that that whole system is really nicely designed so that only when everything will work uh master will take over the role do i run if stated before i moved to this setup i was experimenting a lot with if stated but in the end i noticed i won't be able to accomplish what i need with if stated so no in this setup i don't need it so marco we should stop here sorry we have fewer formal questions but soon the meeting will end and i want to close here all the other questions so please put marco in the hallway talks or send him a private chat with them okay you have my contacts and you have my contacts and i will be in the hallway so we can continue discussing stuff there okay thank you very much marco you have a very interesting presentation and you had a lot of audience and a lot of questions so this means it was a very interesting information here so congrats thank you very much and enjoy the conference thank you for listening