 So how do you connect your computer to the internet? Normally, you plug it in or you connect it via Wi-Fi and some kind of DHCP kicks in and your home router will assign it an IP address from your network and You're gonna go working and see little kittens from the internet How do you do the same stuff in the data center in the data center? you have all your servers and connect them to the expensive switch that you bought and you assign all the IP addresses one by one manually and Assign a default gateway and then you're happy and all your machines are running So of course your expensive switch in the data center You also have some I call them risks and other call them features You use port channels and aggregations probably for more redundancy of your servers And you're using VLANs to separate traffic a little bit. You're using spanning tree Anyone had an autage because of a spanning tree reconverges in the middle of the night or so So that's the reason why I call them risks and if you take away these risks you see it seems like you put making out of your expensive switch just a dump CP device in the data center and and Welcome to my talk operating open-stake on an IP fabric. My name is Jan Walzer. I'm a cloud architect for a novo cloud and I will tell you why your switch expensive switches can do much more better So a little agenda. What is an IP a fabric? Why are we doing that? How do we deploy this stuff? What is does operations mean for such a thing and why we are using comalus networks? So to recap a little bit That's the setup normally You have your servers and in your network and what every server gets Assigned an IP address from the same IP network and you are Configuring as I showed before a default gateway What the traffic goes through the default gateway and we are happy Can we do better than that? There's a thing called routing And we decided to leave all these layer two features out of the setup because we are using And essentially an IP fabric What we are doing is we are using two routers and connect each server to this router and With small subnets Terminology for that is we call these links Transfer networks as you can see and we have a so-called canonical IP the important stuff on that is These transfer networks are only small IP networks. They only have two IP addresses Available because we only knew we only need two IP addresses in these networks because they only need to push packets from one node to the other node and These IP addresses will never be used for any service in the network that's That's the reason why we have this so-called canonical IP That's the IP address we use for addressing a service on that server and How do packets arrive at the server in the end? We are using the Protocol called BGP. That's the protocol that the internet runs on and We decided to have this protocol also running on every device in our network so the servers and the switches are running BGP demons and They exchange all the routing informations between the devices This means That the Routers learn how they can reach the canonical IP of the server and The server learns how to get out of the network or to other nodes. So If we add another node you can see that the routing table is growing This means the other machine learns About the canonical IP of the new server and the new server Learns also the route to the other server and this can grow quite large. We can Connect all our servers in in the racks that way So what we are currently going for is that we deploy blocks of two racks full with two a unit servers, which means in the end around 40 servers in two racks which matches exactly our Switches that we are using because they have 48 Ports we can put in every rack one of the switches and cross connect all the servers to both switches and Have one block completely and all the information how packets have to go through the network Are learned by this BGP protocol So if we want to scale a bit larger if we need more racks than two we just put them next to each other and put some Other switches atop of that and they all again while running BGP They all learn from each other which IPs are where in the network and how they can reach it what we built here is a so-called spine leaf architecture and This has some some really nice properties It gives us consistent performance between all the racks consistent subscription ratios and Latency latency is really important. It's someone else called it before the last Boundary that we have still in it because we can't We can't go a Low enough with latency so also we can scale this performance and latency Because we just put more switches or more links to the setup and it will scale and This is an architecture that's made for Current data centers where we have the Whole east-west traffic where servers talk to each other and not the old fashion where all the servers had to go out to The internet or someone else where we had just north south traffic nowadays. We have mostly east-west traffic This thing is called it's called a class network after Charles Kloss who studied these networks in the range of telco networking in around 1952 and There's a really nice article in Wikipedia. You can read about that stuff so That what that's an IP fabric and what we are doing, but why are we doing that? There are three main Points for us that make this important. That's the operational flexibility the redundancy and resilience and scalability and What do I mean with operational flexibility? With these feature of BGP and learning which IP is where in the network Every IP that we are using for a service can be moved to every other point in our whole cluster and set up instantly If you want to move an IP or a service over to a different data center, you normally have problems You get the penalty because you don't have the same IP subnet over there Or you have a layer to connectivity between data centers and this will get you in completely different troubles So we have the flexibility we have also Redundancy and resilience because we are avoiding all the layer two issues. I mentioned at the beginning and If we have some issues at layer two These issues are isolated Because these faults are isolated because our layer two segment is only between a host and a switch or between two switches We don't have large layer two networks. That's really an important point what we also get Automatically is multipathing multipathing in this case equal cost multipathing is given because the routers Can do the the switching The routing of the packets packet wise or flow-based and automatically will use all the available capacities and paths between the nodes and What we also get as a gift is is any cast any cast means that I can Bring up one IP address at multiple points in the same cluster In parallel Which means if I put these IPs or these services strategically over my data centers over my racks I can do load balancing because Because the rotors will try to deliver the packets to the nearest node that has these IP address So that's what CDNs do a globally scale at a global scale We can do that in the data center and if one let's say a DNS server in my data center fails Does that means our the IP goes down for that service and the packet the packet will also Instantly go to a different DNS server and get replied there So any cast is a really cool feature. It does a load balancing in the data center without having a load balancer and As I already mentioned scalability the Disdesign of the leaf and spine architecture will stay the same I Can just add more links or more switches or more racks or more data centers as I needed The design of the whole fabric stays the same. That's a really important point on that So, how do we deploy that? We are using mainly Two tools for that we have and very important for that set up an IP management solution Because we need one source of truth and we have the the paradigm of infrastructure as code and For the IP management, we know use the software net box And it's configured with all the important data that we need to know so IP addresses networks that are allocated Service IPs the transfer IP networks the hosts themselves with Classification so we can tell a net box Which service will happen on what host and how to configure it later because the configuration for us works with Ansible And Ansible playbooks which allows us to roll out all the changes without manual interaction these playbooks are Are stored for us in a get lab so managing them with git and CI toolchain checks for errors in the configuration and What's the important thing here We have server conflicts and switch conflicts Are the same because we are running cumulus linux on our switches cumulus linux The guys are sitting over there at the booth. They are providing a linux Distribute a linux distribution for your switches, which is debian based So you can log in to your switch and do a troubleshooting like you normally would on your server You can roll out the same playbooks from your Ansible. You can do the same monitoring. You can do the same login configuration It's simply the same and that's a real Improvement because we don't have to manage so much special stuff How does the operation for this whole thing work We have the paradigm of everything is a metric and we are doing metric based Monitoring logging and alerting The logging stuff you probably know elastic search and and kibana. So everything we get on logs Flows into the elastic search database and we have the front end of kibana for that. That's nothing special for monitoring we are using the Duality of of Prometheus and Grafana Prometheus a really great Time series database and and querying a tool so we have millions of of time series with a high granularity to To store them and we have Grafana as as a tool for for dashboards that can show us What's the issue currently and how to look at the Services how they and the alerting then of course happens with Prometheus and the alert manager component of that So the matrix generate events the events get can create alerts and these alerts are handled then And you can send them to two different targets if you need to alert you can send out as MS You can send out a select channel notification if something is happening or you can Make it a call you late in the night if something is really broken. We try to avoid that because We only want to get notified or called out at night if something is really broken and not for some small notifications Because as showed our network for example is completely resilient if one switch goes down if a port goes down That's no issue for us. We can leave the technicians at their well-deserved sleep and Just alert it the morning then there's something to do So coming back to cumulus What makes cumulus linux so special for us I think I mentioned it, but I have to put a really good slide on that It's just a normal linux server based on debian linux. It looks like a server with lots of network interfaces and nothing more deployment and configuration can be done With the same tools and you have for troubleshooting a regular linux environment like your servers and Also the tools for monitoring logging and and promissors and elastic search That's the same as you would work with a server And so the special thing on cumulus for us is the routers are nothing special anymore. They are just servers. That's That's the thing I want to take want you to take home and Thank you. Are there questions one question my talk was about OpenStack. I Didn't mention anything because OpenStack just works. It's just a server. We deploy on the network. It's Deployed the same way as you would with your normal network that you have already, but we have a faster network for that Everything in the underlay our storage solution. We are using Seth for that All the OpenStack services just get configured with other IPs Then normally but these can connect each other find themselves and the service works. So OpenStack is nothing special Thank you Excuse me Yes Do we have a mic? Do we have a mic? I guess it's on. So by IP fabric you mean close Excuse me by IP fabric you mean close. Yes, they are using interchangeably Not not really close is a little bit more than that a close network But the idea of having multiple stages where your packets traverse through a point-to-point Network is the idea of what close studied So it's not completely interchangeable. I would stay with the term IP fabric for that If I understand your question Excuse me Spine leaf is also a kind of technology that says we have different hierarchies And what we are doing is that in the spine leaf you can do spine leaf also with layer 2 networking But we are doing it completely on layer 3 So that's why we call it IP fabric and not a network fabric or a spine leaf architecture alone So it's not exactly the same but they are going in the same direction So they are also stacked like in spine leaf IP fabric It also means a stack of routers like in IP fabric, right? IP fabric means that you are doing everything with on layer 3 with IPs the routing and the Leaf spine architecture between the switches So this fine architecture between the switches, but the layer 3. Yes. Okay. Thank you. No problem other questions Yes, wait for the macro My question is is that you have an overlay Transport a what overlay like an overlay network. Yes, of course, of course That's the that's the point why it works so well what we are losing with these IP fabric is Broadcast functionality to be honest, but nowadays I see no reason to have where I would need and Broadcast domain like we did in layer 2 networking So all the layer 2 stuff that's of course happening in the VM networks And they are on an overlay network currently we are using the the plain open v-switch Which means that all the nodes or the VM the nodes care for that the VMs can reach each other by building VXLan tunnels between the nodes and from the outside or from the view of our of hyperic These are just UDP connections between two Server nodes, but inside it. It's a layer 2 network and All the VXLan tunnels can be joined together. So that's nothing Nothing a problem for us. It just works the overlay network takes care that all the VMs that are in 1 slash 24 network Can reach each other and they that thing is Doing the broadcast stuff But you don't see that in the underlay and that makes it possible that VMs in different data centers Can look at each other and have the feeling they are in the same layer 2 network But from the bottom side it looks to us just a VP an IP connection between Two server nodes last question So a lot of deployment tools use VLANs for separating We don't use we don't use VLANs. No, but many do yes And so you might have a VLAN for your tenant network traffic tunnels might have one for API traffic Yes, there is traffic, so you don't have any separation. No So, how do you do you have any way of securing or Segregating traffic between those different types of traffic. So why would I need that? So that's there's one point of truth in that Of course the management traffic for the hosts himself IPMI traffic and so on this goes through a separate network but every other traffic we have is in the class of management traffic for Managing the service itself. We call it the underlay and We have enough capacity there do not care to care about overloading We are running this with metanox stuff at two times 25 gig for every node so not even our seven outs can currently saturate this and all the other stuff is Inside VMs or virtualized networks and these are encapsulated. They are one layer higher In the overlay so there is no traffic flow between the underlay and the overlay if the overlay needs to go out to the Internet we have separate nodes dedicated nodes that have on the one side the Access to the underlay where it gets the traffic from the VX LAN tunnels and it has dedicated network interfaces to our Outside and border router where the traffic to the internet goes out. So there is currently no Way of traffic from the overlay passing into the underlay and that's our kind of separating Okay, and the the routing is completely flat. You can get from it. Yes. Yes All right. Thank you Thank you