 So we're here at the Applied Micro. So who are you? My name is Gaurav Singh. I'm the vice president of technical strategy at Applied Micro, and I was also responsible for the architecture development for XG1 and XG2. So I am really excited to show you all the things that we have for XG1 and XG2. So if you want to follow me. So we're going to the lab. One of the labs that you have around here. This is your headquarters here in Sunnyvale. Yes, that's right. So this is where we did most of our design team is here and also the lab and the bring up. So what are you testing in here? So we've got a bunch of systems here. These are production systems with XG1. This one right here is a dense compute node for XG1. This is from MyTag. This is with XG2. What we're showing over here is a live comparison between the incumbents. So we have three generations of the Intel servers. We have the Sandy Bridge, the Ivy Bridge. So these are very high, what's called mass-produced Intel servers. Right. Like the biggest, maybe kind of representing the biggest market for Intel right now. That's right. So this is truly the E5. This is the E5, which is the highest volume generated for Intel. And so really what we're taking is we take an aim at the highest end with these really energy-efficient arm processors. And I can show you what's inside. So this is XG1. This is our first generation OctaCore 2.4 gigahertz arm processor and SOC. This is the first 64-bit server chip shipped. Yes, that's right. It's shipping right now. It is shipping. And effectively what we've done is we've incorporated most of the things that you see here including the storage interface, the networking interface, the memory interface all into one chip. Storage, memory, networking, and a bunch of accelerators. It's all integrated into this one piece of silicon. So this is something that Intel is not able to do. So Intel is probably able to do that, but they haven't done it so far. And the reason is because they're trying to address a very wide market, whereas what we're doing is we're taking an aim into the Web 2.0 hyperscale market and we're seeing a lot of traction in the dense compute, storage, and in HPC. Can we see this? Sure. Yeah, so this is this is XG2. This is our second generation. And this is the evaluation platform. We're showing some benchmarks there. What we've done on XG2 is we've added a new feature called Rocky. And by putting that, you can see we get close to two times the performance of an Intel system at much low latency. Two times the performance. All right. So, so how do you make this? Sure. So, so what we did was at Applied Micro, we realized this about six years ago that we really needed to develop our own CPU. And at that point, ARM in the data center was a concept. And what we realized was in order to give the customers what they wanted, which was extremely good performance, but at the same time energy efficiency, which would translate to real dollars in terms of TCO saving, we needed to design our own CPU. So about six years ago, we spoke to ARM and we licensed the architecture for the 64-bit ARM and at that time it was being developed at ARM. So we almost hand-in-hand while ARM was defining the architecture, we were designing the CPU. So as soon as, like since the beginning, you're like working very closely with ARM, right? Yes. And you're working to define how ARM V8 has become, right? Yes, that's right. So we are architects who are downstairs. So what we did first was we assembled a world-class CPU design team and these were people who had worked at AMD on the K6 microprocessor, the people who had worked on the Pentium microprocessors at it. At Intel, we have lots of designers who had worked on the MIPS processors. So we assembled a world-class CPU design team and then we gave them this new architecture to go design and we give them the requirement that it has to be a quad issue, it has to be out of order, it has to be really a high-performance CPU that's really targeted towards the data center. And so that's what we see here. So this chip has the world's first 64-bit high-performance ARM CPU, but what is also interesting is that we couple this with a extremely high-performance memory subsystem. So the memory has four channels of memory. It has very good semantics in terms of the low latency. We put in a lot of effort in making sure that we benchmark the performance to get very good memory performance. So does that mean high memory bandwidth? Yes, absolutely. So there's two things that's very high memory bandwidth, but what our customers are seeing today is that they also see very high utilization of the memory. So there's high bandwidth, high utilization. And in fact, it's higher than even what Intel achieves. So that really is a testament to the amount of effort that we put into not just the CPU design, but also the memory subsystem and the rest of the chip. This is what's called an architecture, your own architecture. It's a custom architecture, right? Yes. So you optimize for performance in the specific server use cases? Exactly, right. So there is the inherent performance of the CPU itself, but because it had to be a server chip, we put in a lot of effort into the reliability aspect of it because in servers, there's a really high bar in terms of the reliability. So we put in a lot of hooks in terms of end-to-end reliability. And since we designed our own CPU, our own memory controller, our own interconnect, we were really able to put all those features and bake them into that entire design. So this is the XgN1, and then you have the right there. So this is mass produced now? Yes. So this is the one that is in production. And then you have the XgN2 also. And so this is XgN2. And so this is XgN2. This is the 28 nanometer. We get a higher frequency. And we also get, so what we've done is we've added some key features into this, including the RDMA Overconverged Ethernet. And that's the performance benchmarks that you saw where we were seeing close to twice the performance at a significantly lower latency. So RDMA Converged Ethernet, what is that? So effectively, what it is, is today most of the transactions that happen in the data center ride over TCP IP. And TCP IP is inherently inefficient. So InfiniBand is a much, much lower latency and more reliable transport layer. But what I wanted to do was to make sure that we were able to ride on the existing switching infrastructure that exists today in the data center. And that is primarily Ethernet. So effectively, what Rocky is, is it allows you to have the InfiniBand transport layer riding over the commodity Ethernet. So Ethernet is going to remain standard for a long time because it's everywhere, right? And even in the servers. But then you change the protocol or something and then you can run more bandwidth, more. Yeah. So basically what happens is that if you look at Ethernet as a transport layer, as sort of the link layer, then there's TCP IP versus InfiniBand. So InfiniBand gets a much lower latency. And the reason for that is because a lot of the data handling, the queue handling, is actually done in hardware versus done in software. The second thing is that TCP IP is inherently chatty. And it works only in the kernel layer. So if a user application is trying to access TCP IP, it has to actually do buffer copies from the user space to the kernel space. You can bypass that whole thing with Rocky. So since you were saying that you integrate many more things than Intel in the chip, does that mean a lower bomb for the PCB or the whole board? Right. So this chip, for example, has four 10 gig ports built into it. So you don't need to have an external NIC. It's got six SATA ports built into it, right? So you don't need to have an additional SATA controller. It's got four memory channels built into it. And so it's really a fully integrated solution. It's also got a lightweight BMC that's built into it in the corner. So for a lot of the power management features, it's you already have that support in it. In addition, we also have some accelerators for specific functions. So if there are some cloud workloads that need certain applications, we can do that as well. So all of that is really integrated into this one device. And so that really has two benefits. One is, as you mentioned, is the bomb cost because we've subsumed a lot of the additional components on the board. It results in lower cost both in terms of power. And in terms of cost, dollar cost. But the second thing that's more interesting is that by putting all of that logic into a single chip, it allows us to have a much more dense server, right? So you don't lose any of the performance and you don't lose any of the functionality, but you can create very dense servers. So if you've seen some of the production servers, for instance, the Moonshot server, it is in a 4U chassis, you can have 45 full-fledged servers, which has never been done before, right? And not just that, you can have 45 servers with 64 gigabytes of memory. And we are the only game in town that can actually enable that to happen, right? That can enable those densities. So they're all sharing memory? Well, so each node is independent and each node can address 64 gigabytes. In fact, architecturally, we can address half a terabyte with this one device, right? And so we are the only chip in the world that can do that in that power on low. So you put a lot of RAM in the chassis? Yes. You put, for each CPU gets its own RAM? Right. And then up to 64 or up to half a terabyte, each CPU? Yes. So that's the future, that's the given idea, what can happen in the future? Well, so the real reason is that in the data center today, right, it's all about latency. You want to have your transactions see the lowest latency possible. So if you go to disk and people are replacing spinning platters with solid state drives to reduce latency, but people are trying to reduce it even more by having all of the database reside in memory, right? So because the access time to memory is say 100 nanoseconds and it goes to the auto microseconds to get to a disk. So people want to put all of the data in memory. So there's a big, there's a huge pressure on having more and more memory per node and to be able to have more and more cheap memory per node, right? So Xgene is the only device that can do that in that power envelope with that performance today. SSD also? Well, yes. So we can connect over SSD as well on the on the storage side. So what are you, are you saying that basically cloud apps, cloud services and stuff can be faster with your solution than what Intel is providing? Yes, that's right. So if you look at there, our memcache demo that we showed you is exactly that. Our, our latencies are an order of magnitude lower than the latency that in an Intel system. So we're not just talking about lower cost, we're talking about higher performance. Higher performance, that's right. So in terms of requests per second, and I think we'll show you that demo, and this is a live demo that shows you that overall in terms of requests per second, we see higher requests per second than the latest Intel processors. But the important thing to me is that you see that at a lower latency, right? Because if you think about it, why does latency matter? Because most cloud workloads today are really initiated by a user that's sitting on the other side, either waiting for his photographs to load, waiting for a search result, or waiting for some, some data to come back to him or her. So the latency is very important. And there are studies that have been published that if the response time of a transaction takes longer, then those consumers are more likely to walk away and go to another side. So latency is becoming extremely important. And so that's one of our key strengths is that we can offer more requests per second at a significantly lower latency. So more requests per second is also cheaper for each request because you run less time. That's right. That's exactly right. And if you go inside again, so there's Mitak, there's an HP Moonshot, and all these are shipping. Yeah, that's right. This is shipping. Yeah. Yeah, and this is an extremely dense system. If you notice that this is just a half bit, we can put another half bit over here. So we can have up to eight processors. So eight engine processors in a one-year traffic. All right. There's nobody to do that. This is the engine two, where we can actually add up to six. So you can see there's six engines per side. So we can have 12. Two, two, two, two, two. Right. We can have 12 within a one-year traffic. All right. The interesting thing is that we are offering, within the same power envelope, we're offering a much higher performance, right, without really compromising any, without compromising on the latency, right? And this is in a traditional one-year traffic. We don't have a moonshot, but you probably have seen the moonshot chassis. So on moonshot, we've got a similar design, but that's actually more innovative. It's got up to 45 cartridges within a four-year chassis. All right. So this is totally mass-producing right now. Mass-producing, yep. Anybody can just go and contact you in order. And this is one of our, the boards. This is our evaluation board, which is the XC1. And we've announced general availability of this board. It has this, it's the same powerful XGene processors with two memory channels, with SATA ports, with 10 gigabit ethernet, and it's completely ready to go. All right. You can buy this, you can evaluate us, and you might be surprised at what you find. All right.