 Okay. Let's do this. Hello. So for the next 20 minutes, now we'll talk about automated system partitioning using hypergraph for 3D stack integrated circuits. Okay. That's quite a long title. I had quite a hard time remembering it. So can I make it simpler by simply saying, we talk about integrated circuits again, and how to make them 3D, and also why we want to make them 3D. Okay. So back to basics first. I'm sure most of you already know it works, but just make sure. The most basic block, just transistor, we back together to make some logic gates, and also gates put together, we make some logic functions. Again, all the logic functions will make an IC, like start just explained a few minutes ago. After that, you can package them using whatever kind of package you want, and what is important to guarantee the performance of your IC is that you have good transistors and a good quality interconnection system. Okay. Good network. Now, what does an IC look like when you have everything done? So there are three essential parts. You have on the bottom, the substrates in gray here on which are all printed your gates, your transistors, and then a few metal layers to interconnect them all together. So what is important to understand here is that a regular 2D IC only has one layer of transistor, and that is one of its limitation. So what do we want to change? How can you go further and what are the limitations of this current technology? Well, one of those limitations is that when you look at the standard cell, all different dimensions defining your standard cell technology you're using for designing your IC, well, you have metal pitch, fan pitch, and all the distances between elements in your design, that if you want to pack more features in your IC, you need to have more transistors. If you want to have more transistors without making the IC too big, you need to make the transistor smaller. Obviously, you can't do that forever. So at some point in time, you will hit the physical wall and also financial wall. So you can't just shrink them forever. One other problem is that you can't simply make the IC bigger. So here the limiting factor is that when you look at the wafer here, there's a big circle with a rainbow pattern on which you will print your ICs. You need to know that there are a constant amount of defects on the wafer. So if the ICs on the left are bigger, you will have a lower proportion of good dies as a bad dies. That's what we call the yield. So if the yield is lower, the chip will be more expensive. So you need to limit the size of the chip so that the yield is sufficiently high and can stay affordable. That's for example what Xilinx did for the latest Vertex 7 technology. So I'm not a big fan of Xilinx anyway. Just so you know, instead of being one big IC for the FPGA, they split it into smaller ICs that are easier to manufacture and then they are all interconnected together inside the package under the hood. So it's quite frankly expensive, but it's more affordable than it would have been with one IC. Okay, so now, what is a 3D IC? How do we go from a 2D into a 3D IC? Well, okay. Let's just take back a regular 2D IC and, well, the most straightforward way to make a 3D is simply to stack two layers of ICs and you have a 3D IC done. Okay, golden now. Can make it better though, instead of having then what we call phase two back, that is the substrate facing the metal layers, you could have directly the two metal layers facing each other so that the interconnection between the two layers is smaller. Can go even further by directly having two layers of transistors, two layers gates, directly on top of each other and all the metal layers above them. But that are all manufacturing problems, not really what we want to focus on today. Somebody still needs to decide on which layer you will place each of your gates. So that is the decision that needs to be made. Now, what are the benefits of going 3D? So we saw can limit the size of the IC but also, for example, we have on the left a 2D regular IC with all the blocks interconnected and on the right you have the red blocks that have been moved to a second layer on top of the blue and green blocks. You can see that all the connections are shorter and by having the connection shorter you have actually a lot of benefits. You can increase the performance by reducing the critical path. You will also reduce the power consumption by reducing the power drop. They also improve the array utilization simply by reducing the routing congestion and limiting the use of buffers since the nets are shorter. So a lot of benefits just by making all the wires shorter. Now, for the rest of the presentation we'll see how we can transform a 2D IC into a 3D IC using existing 2D flows and what blocks we need to add to make that transformation. So very quickly, staff explained it very well half an hour ago. For the 2D flow, first RTL just describes your design then you synthesize it using any open source or closed source if you want tool that will generate net list and the net list will be placed and routed using a dev file that we'll use later. And if you want to stop there, that's fine. You will have a 2D IC, generate some layout, send it to staff. It will manufacture it for you. You can use it and that's it. But we want to go further. We want to extend this 2D flow into a 3D flow. And that extension, well, first would be to simply manually partition the design. That is you have your place and router design and you will decide pick which gates with logic blocks will go on which die. And we need a very clever designer to do that and to interrupt the flow manually patching it and replace and route each die. So that is a bit stupid actually. We want to automate this manual partitioning so that we have a wall 3D EDA flow. Okay, so see how it works. We'll take an example. So let's say that this is representation of a design. The objective here is to simply split it into, that's what we call a bi-partition, so having two partitions at the end. And we want those two partitions to be the same size. That is a balanced bi-partitioning. And also one other objective is not simply butchering the design and then doing anything. I want to limit the 3D interconnectivity that is having as little nets that are 3D as possible. Okay, cut as little nets as possible when doing the partitioning. Okay, to do that first, we actually need to cluster the design, not directly patching it with first cluster. Why? Because, well, earlier I told you that we want to reduce the wire length. So if we just keep the design as is on the left, we may just cut some really short wires. And those wires, when going 3D, going through the all interconnection for the face-to-face, the face-to-back, or the virus, etc. That would actually be made longer. So we would lose in performance. So we want to cluster designs that can hide all the shorter wires inside the cluster and just highlight the longer wires. So now we have two choices aside from the clustering method is to choose the clustering grain that is the amount of clusters you want in the design. In this case, for example, just a four cluster. So we have a very few nets, that's great, but the problem is that some longer wires are still hidden inside the cluster, and that's not great. On the other hand of the spectrum, you could have a lot of clusters, very small clusters, and this time, that's great. All the longer wires are outside the cluster. You can't cut them, you can't work on them, and you prove the system. But the problem is that, well, there are a lot of nets and you may cut more than needed. So one of the trade-offs, one of the game here is to decide the clustering grain to balance between the two aspects of the clustering. Okay, once that's done, let's say that this is the clustering grain I want to use. Okay, the next step is to extract the graph that is representing this design. Well, really easy, actually. So each cluster, each blue cluster, we become one node, what we call a vertex of the graph, and then each net connecting two clusters will become one edge of the graph. And do that for the whole graph, extraction complete, that's it. Well, not exactly, because there are some nets, like the red over there, that is extracted as two edges. Oh, it's one net, it shouldn't be two edges, it should be one object also in the graph. So the graph is actually two limits and a definition to really represent the architecture of a design. We need to extend this notion using what we call hypergraphs. We'll just add one really easy thing to understand is that instead of having simply edges connecting two nodes to vertices, you will have hyper edges, like the blue one in the background, that would connect several nodes all together using one object, one hyper edge. Okay, so if we come back to the net that was problematic here, one transforming this net using hypergraph, it would just be one hyper edge. And as when you cut one part of the net, you actually call the whole hyper edge and not two separate edges. And you can do that for the whole design and have something a bit ugly, sorry for the cloud blinds in the room. And once that's done, you can finally move on to the next step. Okay, so no one can really portion the design using some portioning algorithm, so I won't dive into the details here, not necessary. The idea is to respect the objectives we set at the beginning, so there were to have a balanced partitioning that is having, in this case, we have the same amount of clusters in the partition, that's great. And we only cut two hyper edges, so that's great as well. Okay, let's say it's done. The next step is to generate some net list. So in a 2D ASIC flow, next stuff said earlier again, you would generate a net list after the synthesis and then you will use this net list to place and roll to design. Well, for the 3D flow, it's exactly the same. You need to generate a net list for each of the partitions, for each dive, for each layer, that you can then place and roll to interconnect together. So those four steps aim at replacing that stupid and silly manual partitioning so that we can have manual 3D and automated, no, automated 3D flow, not manual, automated 3D flow. Okay. So some of those blocks are homemade, like for example, the clustering. We just take the def file that is used during the place and road step of the EDF flow to extract all the design geometry. We use also the def file for the gates and properties. And once the design has been clustered, it's sent to the graph extraction algorithm, also homemade in this case. Just while work on the graph extract some information, set the weights and the objectives for the graph partitioning and formats the designs that can be read by graph partitioning tools. So those were not homemade because they are really great passioning tools that already exist out there. So no reason to make our own. And the two, for the sake of credits, we used our HMATIS and PATO, respectively from the Carapace Lab at Minnesota University and one developed by Professor Kataljorek during his PhD at Bilkan University. So those two are really great hyper graph partitioning tools if you want to use some. And the last step is still not on GitHub yet. Really out of the stage of developments and we're trying to, like I said, take the net list from the 2D synthesis. The information from the graph partitioning and then generate a split net list for the design. Okay, this flow has been tested on several designs among which we had the LDPC, which is a very small core for error correcting code. One version of RISC-5 and what was it? Berkeley University, yeah, Berkeley. And then some modules of the OpenSpark T2-SLC. So we did not tackle the war and OpenSpark at one time but we split into just the SPC, which is the core. The CCX, the memory crossbar, the RTX, the Ethernet module. So all of them different properties, different sizes and different amount of gates and nets. So it can be interesting to see if or flow is robust to different kinds of designs. And talking about robustness, well, yes and no actually. So what you can hope using this flow for this particular designs is to gain up to 77% in total wire length reduction when going 3D. But you can see that it heavily depends on the design that is tested. So the crossbar highly benefit from it but the Ethernet module in green on the right can gain up to, well, down to 0% of games. Just lose some performance. And also widely depends even for the same design on the clustering grain that is used. That's great. But also on the ways that we set and the objective we set for the graph partitioning. So do we want to reduce the amount of net that are crossing in 3D? Do we want to reduce the total wire length or maximum total wire length that is cut? And all those kinds of objectives will impact the gain in the end. One other gain we're gonna have apart from, well on top of having the shorter total wire length is having a shorter critical path. So that is important for the performance gain. And again can see that the crossbar will gain more than the orders and up to 61% but more than let's say 35% in average. Okay, but there are still a lot of open questions. So this work is far from being over. It's all part of my PhD thesis I've still have three years left so I have some time. But at the moment some of the question we're asking ourselves are what is the best clustering? So the one I showed you here is, I prefer to say naive than stupid but it's actually stupid. It's just geometric one. You just take the design and then split it into squares or rectangles of the same size. But you will cut a lot of short wires and that's what I want to avoid. So there must be a better clustering method and we're looking for that best clustering method. And then once we find it does it actually have an impact on the design partitionability? Is it really important to have a better clustering method or not the change really anything is the grain that important? So we saw in the early result that it seems to be but is it always the case for all the designs? And on top of that can we predict the partitionability of design? So if give us a 2DIC you designed can we guarantee you before trying to do the workflow for the EDF flow that we can gain up to, I don't know 30 or 40% performance by going 3D is it worth the hassle? So those are just the frambling. A question was asking ourselves if some of you own now we'll be more than happy to answer them. I thank you for all your attention. Please. It's all very new for me, but just the design can you also integrate the design to have a partition? I think you could have a picture design too or something Does it really operate on a much later level in your design process? Yes, at the moment it's really later than the process. Oh yeah, sorry. So the question here is can you directly design your REC for example in 3D instead of having to partition it later? That's it yeah, okay. You cannot yet do that because you would lack some 3D awareness in the flow. So if you design your REC in 3D that is the aim in the future but in the near future we can't do that yet because we lack some tools to place and route for example the three dimensions. So all the place and route tools are just two dimensions X and Y. And if you want to have a 3D aware flow you would need to add one set of coordinates to the place and route and that is not done yet. So no, you still need to design it in 2D and then split it. What impacts? Performance. Performance. Interconnection, yeah. So it's asking if there are performance impacts when you're going 3D versus in regular 2D REC. Yes actually so indeed when you if I just go back to the slides it would be easier for everybody to understand. When you go in 3D you have obviously interconnection between the two layers like here. And those interconnections are longer than the regular metal layers. So the one you have A at the front C is longer than the metal layers you have here. And we try to reduce the performance loss due to this longer vias interconnection between two layers by cutting the longest nest that would be shorter even when going 3D. So you would have actually yeah performance loss if you could really short wires that would be elongated by going 3D and trying to avoid that. So for one layer to the other place or at least to inform the other one what kind of big ones would be independently? I do not really know it works. So yeah, sorry. My guess is either way is fine. You can't say to have a place and a route for each layer without constraining anything because they would just take the same design. They simply use the same constraint they had in 2D but with more space to work on. So you could say for example that going with the red blocks on top what would be all scrambled and moving away. In fact, it would just be placed closer to each other and the top left cell would not be on the bottom right. It would stay on the top left because 2D placed in a route when it had less space and more constraints place it there. So when it has less constraints it would still place it probably somewhere around there. So I'm not sure. But I guess we could still work without all the constraint between the two layers. I think it would still work without the constraint between the two layers. We wrote it indefinitely. Great.