 Thank you. So when I last did this talk, I generally used to like talking about some very specific things that we are trying to do inside the data center, whether it be around white box switching or Open 19 or different rack designs with a bunch of stuff. But I was pretty excited about this one because it was more like, hey, how did you make this transition? And generally, you don't get to tell the story about how you sort of got from one place to the other. So my point here is just to sort of give you an idea of what things we went to think about when scaling from like a traditional enterprise thinking to sort of how do we support hundreds and thousands of machines? And we're thinking about hyperscale network, right? So I'll tell you, it all starts obviously from traffic, right? We all run, all of us who run data centers always are physically looking at traffic. So inside LinkedIn data centers, the traffic demands over time has grown tremendously, right? And what's interesting about content companies like us is that when you have sort of one byte of data that comes from, say, North, South, from the internet, inside the data center, it spans 10x, right? So because the way the applications are built, there's an almost call graph, and it goes and spans across data centers inside and across, and that creates enormous amount of traffic, right? So the other thing is we move tons of metrics inside the data center. It's in the billions per second, right? We have offline jobs processing. We have machine learning algorithms that are going, seeking across networks, trying to get a bunch of data, lots of replication. All these things add to the amount of traffic. And all of this machinery is impacting our sort of close to half a billion users that we have out there. So one of the things that our developers talk about, and I was talking to a couple of my colleagues in the development side, a few years back, they would write a piece of software and they would say, I need X number of machines, right? But we quickly realized that that type of thinking just slows down a developer. What we wanted to create was basically infrastructure across the board, and not just talking about racks and serves, I'm talking about even a little bit higher level than that, that it is completely invisible to the developer. They should not care about it. It should be extremely simple to use. It should be elastic, it should scale automatically. The developer shouldn't have to think about, oh, how many calls do I have? Do I have to go and change a few of the parameters? No, all these things should happen automatically, right? There isn't a single point I feel like it's always available. So these are the things that we wanted our developers to sort of go with. You can get things on demand. So when you start sort of thinking about that, at LinkedIn we created something called LinkedIn Platform as a service, it's all in total cloud. So think of a developer who just puts in a little bit of a blueprint information and then ships that information out. We have a system called Rain and Race and it says, hey, I now know all this information and I'm gonna go and find the best server or servers for you based on your compute and your storage demand in general and then it goes out and sort of populates that. And then once it's out there, you've developed these pieces of software, you don't think about it at all. Just let it be, right? And then we have another feature which just basically auto-scales it based on the demand of the application. And at that point we have tons of metrics and learning mechanisms that basically scale the application on the backend. So this is sort of like our sort of view into our internal cloud and how we serve the request that you were asking for. I think the previous gentleman just said, like if you're asking and looking for a job. So when we sort of think about that, when the problem comes to like on our side and infrastructure side of the house, what does that translate to? So when I look at something like that, the team sort of has to say, okay, well are we building to sort of some very specific things? The challenge that I always have my team do is think of a much broader vision. So we basically sat down and said, in order to sort of get to a hyperscale model, you have to think about unlimited bandwidth. That means developers should not have to ever think about how much bandwidth do I have in order to make my application? Well, they should always think it's unlimited. They should be thinking that there is no latency. There's, they can get whatever they want and we will make sure that it's there. The compute is on demand. The part that we thought of is that it should be a completely desegregate model to give us more and more control of our destiny. We came up with this concept called programmable data center and I'll touch a little bit in depth about it as we go along the slides. And then the other one is self healing where as we scale the infrastructure, if something breaks, you know, you don't run to fix it, it should auto heal, right? And these are some of the principles that we sort of took. So out of that, we created our flagship design called Altair, which there's been some numerous blogs on it but I'll quickly go into it. We looked at it as ability that as we build, we need to future-proof it from 10 to 100 gig capability to the host, right? We took a different approach where we, at our scale, we don't have the ability, we don't have the Greenfield data centers. We do go in by wholesale space. So that limits us to have like, you know, enormous amounts of data center space. So we do dense compute where we try to pack more servers in Iraq. So we pack up to 96 hosts in Iraq. We decided that we will move into the single chip switching. That means no chassis in the network. We want to be able to support 200 gigs between the cabinets. And of course, non-blocking parallel fabric. And we're trying to get subscriptions down to less than sixes to one. And then the other one is our own venture into what we call the Falco fabric base, which is our own switch design. So when we sort of looked at these things, we said, well, how do you scale these things out, right? So when we scale out our data centers across, we have numerous of them across the US. There should be, you know, simple. Our design approach is that there should be simple non-blocking IP fabric. We want to be able to do multiple parallel fabrics on class network architecture. And I'll go a little bit depth into what I mean by that. And we also took this stance that the merchant silicone should have least amount of features. We don't want unnecessary features. We don't want to be putting things inside the network where we don't even use them, but then we have to deal with the bugs or the support and all the other things, right? So that was kind of one of the things. Distributed control plane with some centralized control. The other one is wide multi-ECMP, right? We want to be able to go across the fabric and not have to go traverse cores anymore. So that's one of the characteristics. And then uniform chipset bandwidth buffering, low latency and small buffering requirements was sort of one of the things we talked about as we sort of scale out. So what do you do when you try to get to a hyperscale? You kill the damn chassis, no more chassis. So we took a very, very hard stance on this that we will do a single skew. That means that one U is the entire fabric, one skew. My bomb is like this big, right? For a network that is pretty massive. So we said, because if you want to get to the scale of going down to minimum number of chipsets where your packet is traversing less than five chipsets, if you put cores in it and firewalls and load balances and all these things, they will add to your latency. So introducing parallel fabrics, right? That is kind of like one of the key things that we did. So think of each tour and we have a single tour. Each tour is four-way ECMP path out and there are four fabrics. We color our fabrics and each of those fabrics are not at all connected to each other. They are totally independent, right? We have one-to-one over-subscription across the fabric using minimum number of chipsets to carry east-west traffic. I think that in the early days, I calculated we were doing like hitting like each packet was hitting like 15 to 20 chipsets and now we have it down to less than five chipsets for the server. The principle here in the fabric is that you are able to support 100,000 plus bare metal servers without adding any additional fabric layer. And they should support up to 64 pods and each pod has 32 cabs and each of the cabs has 96 bare metal, as I had mentioned before. And then the fabric is limited to a three-tier typical five-stage cross and the whole data center is minimizing its chipset to reduce latency. And then at the host layer, as I'm going down, I'm supporting 10, 25, 50, and 100. At this point, we are largely doing 10, 25. We are open to 50, but I haven't seen a lot of use cases yet for 50 and 100, I think is... I mean, we've built the infrastructure for it. It would be nice if the kernel gets there. I think that'll be fun. So this is a three-dimensional view of the same network. What I wanted to really show here is that how we've built our planes and how they spread across. So if you were to traverse from, say, part one to part 64 in this architecture, you're looking at a switching of 2.5 microsecond, right? So in the future-proofing aspect, we also did a 100-gig transformation sort of early last year and we were a little bit on the cutting edge of it. It was just out the Tomahawk platform. And so what we did was we, as I talked about the single skew, so all our spine and all our fabric is that one skew, you can think of like, I think it's like the Cisco 3232C is one of them and there's a number out there and we also have our own that's sort of blended there. And basically what we did is we did 100-gig across the fabrics, as you can see between the spine and with 50-gig licks connecting the spine to the fabric. Now the tour is we decided to do 50-gig each path, so four times 50, right? Four paths of 50. And the way we did that was we used PSM4 so we could split each of 100-gig to 250-gigs. Then the next thing I wanna talk about, which is what I mentioned is our desegregation, right? So earlier last year we kind of opened up and really got into the desegregation space where we got ODM platform and we put our own Linux, NAS, and then we're working on the application layer. What's interesting is that we were really more focused on figuring out how do we just separate the two. And this was like sometime earlier last year we were very new into the phase. So we looked at like, hey, let's build our own NAS whether we get a third party where we get open source and try to just figure out how to do the ODM space and it opened up a new world for us. But it also taught us a lot of things. It taught us that spending a lot of time on the lower level between your ASIC, your ONI, and all your HAL, these things take a lot of developer time. What our team was more interested in because we built a five-stage class network, we had different sets of problems we had to deal with alpha and mice issues, ECMP not being equal path. So those things can you have to think application a little bit more on the control plane. So we started to spend a little bit more time and then found that the low level part was a little bit of a pain. So we transformed our desegregation model into something we call now open fabric. We don't really care about the NAS. We actually say that look, we can just pick one. Like there's so many, there's a few people out there that are doing this, they're great, they're taking care of the low level stuff. We'll either use them or we can interchange them when we want and as long as they work with the hardware that we want, that's fine. What we really wanna focus on is in the control plane. We wanna be able to look at distribution of reachability, fast, simple distribution control plane. We don't want crazy amount of features. We just want one or two things that we need for it to do. But what we want is auto discovery. Basically what that means is, I envision a data center technician taking a switch, plugging it in and turning the power on and it knows whether it's a spine or a leaf. I don't have to configure it even a little bit of it. It knows how to connect to it. And so it's topology aware. It knows how to auto discover neighbors. And then all the policy is a little bit more centralized and where possible we use our own pipeline system called Kafka and I'll talk a little bit about what we're doing over there. But basically the idea of our desegregation or our next step in desegregation is that. And it makes you really rethink the network stack. It makes you think about your forwarding plane a little bit differently. How do you do link selection? How do you do topology discovery? How do you rethink your telemetry scenarios? You know what you can do. There's a whole world of opportunity for you to do a lot of different things on the application side. Like I was saying, self healing, predicting failures, all these kind of things that you can add on to the control plane. So we want to spend a lot of time on that. I'm running a little bit out of time so I'm trying to get through. What I wanted to show here was that we've got this philosophy that our network folks, they don't really want to, they don't care about SNMP anymore. They don't wanna deal with syslog. So we said, okay, then let's not poll this stuff. Let's not put the stress on the control plane. Let's stream everything using Kafka and then we send everything through a pipeline and then our stuff is stored in Hadoop somewhere and we were able to do all kinds of amazing magic on it, which is basically this and that's basically a programmable data center is that what that means is that we take all of the telemetry and we push it through a pipeline and then we can do all kinds of processing, machine learning on top of it and those things are your network management aspects. So you don't think of network management by polling you think of it sort of like just using streaming telemetry and you can use that to either monitor the network or you can build feedback mechanisms and say I'm going to apply this action. So that's sort of the future of where we are going. And I think on that note, I'm just about in time and closing people generally saying, oh, why do you build your own switch? Why do you wanna do all this? And my answer to that is it's not about the fact that we want to become experts in this, right? It's that because we see that it's a massive advantage to control the destiny of our infrastructure. If we control the destiny of infrastructure, it directly impacts how you are able to use our application in the fastest and easiest manner. So any questions? So the question if people didn't hear is about Microsoft because of the acquisition, are we gonna change our strategy or not? So my answer to that is LinkedIn will continue to manage and control its own infrastructure. We are doing this for our member base and at this point, we will continue to do what we are doing and at the future time, things may change but that's yet to be decided. For our distributed control plane. So for, can you read a little bit more specific? Ah, for that level, okay, so I got it. So yes, so we are doing BGP to the tour and what we are doing there is actually the open fabric is now a draft at ITF. If you wanna take a look at it, basically what we are trying to do is use a label management in there to figure out how to do the traffic. So yeah, thank you very much everyone.