 Let's have a big hand for David Kaplan Thank you very much. I'm happy to be here. This presentation was put together in conjunction with one of my colleagues farhan I'm actually going to talk this evening about real-world hardware design. I do security hardware now is my day job but for this talk I want to focus on what goes into making a real-world x86 CPU and during this talk I'm going to go through some of the Practices tools and various techniques that are used in this kind of development and some of the unique challenges associated with it So starting off I'm not sure how many of you have worked with hardware design before but hardware design is very different than software for starters It takes a long time, especially with an x86 CPU to design a Brand-new CPU from scratch can easily take up to four or five years of constant development With teams of hundreds if not thousands of people. It's simply a Very complex beast. There's a lot that goes into it. It's also very expensive I mean besides just the cost of hundreds or thousands of people it's Doing a Fabrication of silicon is very expensive in the kinds of process technologies that Modern x86 chips use a mask set as it's called Which is what's used by the fabrication facility to actually build the chip that you've created can cost upwards of three million dollars And that's before of course you also pay for all the special test equipment and everything else that goes into it and Another big challenge is as we'll see as we go through this talk It's very difficult to test everything in the design before you send it to that fabrication plant And then it can become very difficult to figure out what happened when it went wrong afterwards And we'll talk about some of the different techniques that are used to help mitigate that x86 CPUs are Especially challenging in part because of the complexity a modern high-performance chip can easily be around 60 million NAND gates and RTL code as it's called which is typically in a language like verilog or VHDL can easily reach a million lines x86 cores also run just ridiculously fast It's almost hard to fathom what that it you know say a three gigahertz CPU runs at three billion cycles a second that's just It's very difficult and we'll see how that plays in a minute Related to that is that these chips have to work basically the whole time. I don't think most of you You know lose sleep at night thinking about your x86 CPU having a bug or a malfunction or something in your program But if it did that could be catastrophic. I mean the CPU it has to be functionally correct for basically everything in your system to work What this means from a harbor development standpoint is that you got to have basically no bugs Even if you had a really rare bug it occurs one time in a billion That's three times a second. That's that's not gonna really work So CPUs must be perfect then well, you know, obviously not I'm sure many of you are familiar with some of the infamous CPU bugs are been the Pentium divide bug in the 90s. There is a AMD TLB bug In 2007 But there's a lot more than that. In fact, if you open up the Rata guide for a modern CPU You'll see something like this and if you're in the back that says no fix planned on the right side there That being said These issues are very minor You don't have to take my word for it. In fact, I would encourage that you don't I would encourage you to go and download one of these and read what the type of Irata are that are in these production systems in most cases these simply don't Really matter or there are software workarounds available But there are a lot that still make it into silicon So I want to talk a bit about the hardware design process then and and how the testing of these chips is done CPU development starts with the design and verification, which is where the teams write the verilog or VHDL code They do a bunch of testing on it in a simulation environment that can take Anywhere from one to about four years depending on how much is changing in the design Once that's completed. It's sent to a fabrication facility It takes usually at least two to three months to get any silicon back from a fabrication facility After you do you then still need to test it that Validation process as I'm calling it here can easily take up to a year or more So that's where you put this all together. It can be four or five years To get something from concept all the way into mass production So I first want to talk about the verification or the pre-silicon phase of design and What is verification verification is a discipline within silicon design that ensures a design matches the Specification and it's worth pointing out that when hardware and CPUs in particular develop You don't just have a functional specification of the functional specification may say what instruction sets the CPU supports Some things like that you would also have a performance specification a power specification And those all need to be tested in the same way if you build a processor and it's slower than you were expecting it to be it's just as worthless as if it had a bug somewhere and The goal of verification is to find defects of course or bugs and like with many things There earlier you find the bug in the design process the cheaper it is to fix So how do we find these bugs? Well the standard way is to use a simulation test bench And as what's shown here is you're gonna have your hardware design. So this is your verilog code that's the block in red and you're first going to have some way of generating stimulus into that design and The typical ways of that happens is either what's called directed Testing or random testing. So directed testing is you've written a sequence of instructions that are gonna go and execute in random testing you Either open it up to some or or maybe all of the instruction space and just throw stuff at it the vast majority of testing that's done is random testing because it's great at finding all these weird corner cases and is much less time-consuming for humans In addition to applying stimulus to the design It's also applied to some kind of checker if you're working at an x86 core level This means that you'll probably have some kind of say C++ model of the x86 architecture That is running in parallel alongside with the design and you would send the same instruction to both of them And then when the design is finished you'd compare your register output or your memory output, whatever and look for any variants Checkers can also be at much lower levels in the design and this is a very common practice to have checkers around Specific blocks and in fact often probing directly into those blocks and so for instance you might have a cache checker that would run and make sure that you don't insert duplicate lines in your cache and These sort of things are really useful because Verification time is so limited you want to check for any discrepancies as early as you can and Get the most out of your testing cycles and so it's very common to have checkers kind of all the way down into that design Another important characteristic of test benches is that there's typically some way to measure coverage and Coverage is a very important metric That's used in at least hardware designs because it helps Determine how far along your testing is and if you are actually hitting all of the code that you expect and all The branches you expect to execute you absolutely do not want to go and fabricated design that has untested code in it and Coverage is a great tool that's used to help prevent that now Test benches do not run very fast and this becomes a major issue if you want to simulate the entire We call SSC Chip so this might have multiple cores and Northbridge Southbridge this sort of thing You're looking at a speed of about 1 Hertz meaning that you're simulating one cycle for every wall clock second that you run this thing You cannot get a lot done at 1 Hertz and this is with like top-of-the-line tools hardware all this sort of thing So the natural thing is to break things down into Smaller smaller levels if you are testing just an x86 core, that's about an order of magnitude faster To simulate which is still really slow, but it helps If you break it down even further you can get I call multi unit testing So this is a typical practice of combining a few related blocks like an instruction fetch and a decode unit together or you can even go down to a single unit testing like the decoder or the load store unit something like that and you're looking at in the ballpark of 100 maybe 200 Hertz so 100 200 cycles per second of simulation Now compare that for a minute to real silicon Which runs at 3 billion cycles per second and you'll see this is far off in fact in the very first second That you power on a CPU That's the equivalent of nine and a half years of testing at the system level so Basically as soon as you turn the thing on it's already gone through more verification than it ever did Now there is a way that you can kind of throw more hardware and money at this problem Something called emulation. There are two to the major design tool companies cadence and synopsis Make emulation Machines and these are special hardware. I think they're kind of FPGA based things that allow you to load a Design on to them and run testing at a faster rate And I should point out this cadence system is way bigger than the picture. It makes it look. It's a serious box One of these boxes is gonna set you back probably close to a million dollars But they can run it around one to one and a half mega Hertz So it's still two to three thousand times slower than the real silicon But it's a million times faster than simulation So they're very useful But they are costly as well Now one question that I often get asked or a myth that I sometimes see is well what about formal verification and formal methods and anyone's not familiar formal methods is basically a mathematical proof of the behavior of a certain design and Formal verification is great for some things. It is Really cool. It's great to get a proof that says a hundred percent. This is how it worked. The problem is Formal tools first off really crap themselves when you give them a big design They're basically sat solvers. They can't deal with that the second thing is they have to have something to compare against and When you're working with something like a multiplier or a divider It's pretty easy to give it a multiplier and say make sure these things are the same If you're dealing with an entire CPU, it's a very different story It might even you might have to re-implement the entire design So then you can verify it against something and that's that's difficult to do so the Experience I've seen is that formal verification is great for a few of these selected execution units It's a very small piece of the overall puzzle so the the track that we're in right is on on failures and Philanthropy so one thing I want to talk about is what is verification fail at and We'll start by saying what it's good at verification is good at finding Bugs with your basic functional behavior does this particular mode of operation work do these exceptions Happen when they're supposed to that sort of thing Anything you can do formal proofs for is also is also good and it's also useful for Coverage analysis are there all your instructions executed or they execute in all the different modes Do you get all the different exceptions that sort of thing? But verification doesn't find other types of bugs One big category are system level bugs and as you remember from a few slides ago The system level model runs so ridiculously slow That you simply don't get a lot of test time on it And so those are going to be the bugs that are most likely to slip through the cracks Those would be bugs like two different components in the design having some sort of protocol disagreement where they End up in an unknown state because they don't talk the same language the most common thing that I've seen is that Multiple seemingly random events seem to be required to hit these bugs So, you know, we're talking things like you're doing a compare exchange instruction When you get a cash probe when an interrupt is pending and at the same cycle the thermal sensor says the processor is too hot Right things like that are just difficult to hit all those kind of cases during testing You start running at billions of cycles a second and they have a way of coming up a lot more often Another thing that's difficult to find are any long runtime events So whenever you have a large data structure, it could be a l3 cache something like that Those are difficult to test some of those cases in verification So need to be aware of that and a final thing I'll just mention that I've seen is what I'll call statistically unlikely matches and Imagine for instance that you have a design that has some sort of special behavior When you have two different say memory operations and the lower 20 bits of the address match But the upper bits do not match Well, if you're generating all of your addresses randomly in your random stimulus the chance of that happening is really really really small and you're not going to get a lot of test time on it and Those bugs are going to slip through Now it's worth noting some of this you can kind of fix if you knew that the design was a really sensitive to this case Where 20 bits match and the upper 28 bits don't You could Specifically have stimulus for that you could constrain your random address generator to generate cases like that some of these cases, however, you can't do a whole lot about multiple random events You do the best you can but something is always going to slip through great, so the next thing that I want to talk about here is what is done after the silicon comes back and As I'm sure you can imagine you get your silicon back from the fab and It's not going to work perfectly and you have to debug it. So how do you debug stuff? Well, everyone here. I'm sure has debugged things before it usually looks something like this If you're with software, you know, you probably investigate the problem You run gdb do some printf something you figure out what you did wrong you fix you change the code You recompile and that's all great With hardware it doesn't quite work like that So first thing is is what that what happened in the design? One common way of figuring this out is to use something called a JTAG interface I'm sure some of you are familiar JTAG stands for the joint test action group it's a IEEE standard that you'll see on a lot of hardware and Modern xa6 processors have JTAG pins you can see an example of some of those here how you have your test data in test data out test clock and so on and The IEEE spec dictates how you communicate to these pins Now I should mention that these pins are physically there They're generally not brought out in motherboards. So you won't find an easy way to connect to them But they're still physically there on the package if you look hard enough Processors have to implement certain JTAG commands that are part of the standard things like bypass ID code Which are typically used for verifying that like soldered connections on a board or valid but I tripped we also left the spec open to add whatever other proprietary commands a vendor wants and So you can imagine that if you're debugging a CPU you might want to have Your kind of debugger commands right read and write register state memory single step these sort of things and That can be very useful for figuring out what happened now, of course The processor doesn't magically do this you have to design this Debugger into the the system But you mean you're gonna need something like that because you're gonna have bugs This sort of thing can be very useful I mean, it's you know kind of like your gdb, whatever, but it It doesn't work always and in particular one very common thing to happen with Silicon designs is the thing will just hang and it'll just be completely frozen and there's nothing you can do to it So how do you debug that? the answer is something called a scan dump and This is a feature that is kind of like a crash dump. Basically you take all of the register state All the flip-flops and the design you dump them all out through the JTAG port so you can go and analyze it And the way this works is that when flip-flops are built into the silicon They look something like this. So you may remember your kind of standard D flip-flop from from class There's now an extra mux in front of it that selects either between the normal data that that flop would store and Something called the scan-in data These are then connected together in a chain like so so the output of one goes to the scan-in for the next When you want to dump the flop state you assert the scan-enable signal that goes to all the flops and then as you clock the design every cycle all the flops shift into each other and the final flop and the chain is then connected to your JTAG pin and over time and of course not a lot of time You can read out all of the register state in the design So this is pretty nice. You can then analyze it offline. Whatever You know, it's not perfect one big limitation is you only get the data that's stored in flops in the design You don't get access to any of the intermediate signals in the design Also, it's a single point of time thing. It's really like a crash dump It's very likely that the information you're looking for is no longer there One problem with running at the clock rates that CPUs do is that if you don't get the hang right then There's no way you can manually stop this thing in time Sometimes you have to take a whole bunch of these and see if you can get lucky Sometimes you got to look at all the kind of invalid state and the things that are clearly left over from earlier iterations But these are two examples of Practices that are that are often used. There are many others Which I can't talk about So Let's assume that we've used some methods and we've figured out what happened Fixing it is not exactly a piece of cake either, you know, the simplest thing would be to go fix a very long code and Redo the entire design which takes two months and three million dollars So the good news is that you don't have to all the time The way that modern ships are built is that there's something called a base layer and then metal layers and a modern process technology has up to about nine different metal layers and To overly simplify this because I'm not a process engineer The way this works is the base layer has your logic gates your and gates your nor gates, etc And the metal layers are the wires that connect those gates together So what that means is If you need to add new gates in the design You got to change the base layer, which is the most expensive thing and you got to wait the full time and that sucks If you don't need to do that, you can just change maybe even one metal layer in that stack That tends to be significantly cheaper it also Means that it's less of a delay through the fab because you can kind of intercept their pipeline because chips are built First at the base and then the metal layers as they go up So inserting a new metal layer into that might only cause a few weeks Of delay before you get the results back One thing that's common for physical designers to do is whenever They're building a block if they have any white space in that area They will actually put extra gates That are not connected anything They're just there on the off chance that there's a bug and you need a new gate and you want to wire it up So It's it's kind of if you're building the silicon then why not put some useful things in it Now those are the costly solutions. That's the only way that you can really fix an issue But there are a lot of things you can do to work around the issue And it might not cost as much as three million dollars So, you know one thing that we do is like if there's a problem you You know you go to the lab and you try to look for one of these and see if it can like rewire your chip for you We tend to have the more sophisticated version, which looks like this This is a focused ion beam machine It's a cool beast it can take a chip that's already been fabricated And it has a electron microscope in there It actually shoots little ions into it and it can rewire small parts of the design Can't do major things, but you can do small things This is done on a per chip basis the only issue with this is You have to do it Chip by chip and also the chips have a very strong tendency to die in about one to two weeks Afterwards and that's just because the process is so destructive to the chip So if you need to prototype something, it's great. You can do it in a couple hours. You get your results. You can try it out But This is not going to be a production solution And neither is the cat So what else can we do? Well one very common practice is that hardware designers Will put disabled bits into the hardware and I've always heard this called chicken bits because the designer is chicken and maybe the thing won't work and These are are very useful They are typically used for performance or power enhancements on a design so that you could Disable a certain feature and the processor still works just fine. It's maybe a little bit slower And it's worth noting about this that When processors are built There are some things that give you a ton of performance. I mean branch prediction everyone's got branch prediction nowadays but the way that x86 chips get the New performance that you see generation on generation Is generally by a sum of very very very small parts There will be features that get you 0.5 percent over here 0.25 percent over here, you know, maybe there's a big feature that gets you 1% These all stack up and then you get your 10 15 improvement whatever you're expecting If you need to disable one of those to fix a critical bug That's not always the end of the world And there have been cases where that's that's been the work around that had to go out You can find some of these in a document that AMD publishes called the bios and kernel developers guide I'm sure there's an intel equivalent of this as well In fact, I think we've seen screenshots of it in other presentations On x86 these sort of bits live in what's called the model specific registers or msr's This is an example of one The data cache configuration register And there's a few bits in here that are defined that can be used to disable Certain aspects of the hardware prefetcher, which is of course performance enhancement. There's also a bit to disable speculative tablewalks as well These features are useful not only for production if you have a bug with say the the hardware prefetcher They're also very useful in debugging because you can start disabling things until you get down to the root cause This does require that designers Kind of think about what failures they're going to have and what things they're going to want to disable down the road And it's one of these things where You might as well throw the kitchen sink at it because you much rather never set a bit Then have a bug that requires a three million dollar re-spin Another option that is available on modern cpu is something called microcode patch Microcode is like an on-chip firmware. It's used on processors typically for implementing things like complex x86 instructions so like irat rsm There's a whole bunch others harbour task switch things like that interrupt delivery And a lot of power management features and microcode Basically breaks up these complex flows into a sequence of smaller operations that the harbour can natively understand The way that microcode is built is that it's in an on-chip ROM that's physically present in the silicon But a very common practice is to put a small s-ram next to that ROM called a patch ram And that patch ram can then be used to replace some or maybe even all of the microcode If needed to either fix bugs or work around things in some way and so this is useful for Modifying instruction behavior. So for instance, if you had to add a serialization after a seal flush That's something that that microcode could do if there is some rare corner case that you've discovered There's a bug in the the microcode flow You can patch it through that There's not a lot of public documentation on microcode patches The best resource That you could probably find is going to be in the linux kernel. There's a path to where the microcode patch loaders are There's one for intel. There's one for amd One one thing i'll say about this is microcode patches are typically signed by the vendor. So i'm sorry, but it's not something that You can necessarily go off and write but These are a very useful tool that i'm sure you can imagine would be applicable to things besides just x86 cores as well For patching things in the field if necessary So putting this all together When dealing with hardware we've talked about j tag debugging and scan as two ways that you can help identify the problem when it comes to fixing things i use work around in there as well because Sometimes you got to get stuff out the door and you do whatever it takes and the line between fix and work around becomes very blurred But you obviously have silicon spins you have microcode patch you have chicken bits and If you need a quick fix you can always Go and get a fib done So since we are A lot of us are security people I felt like i should mention something about security All of the debug interfaces that i've mentioned here Might be considered more than debug interfaces to some and that's something that can't be ignored Debug interface security needs to be something that's part of the design and it's tested Some examples of the security that that i've seen in the past have been for instance to disable Some are all of the j tag commands on production parts A typical way this is done is that there are fuses that exist on the silicon Basically a one-time programmable memory Once apart is configured for production the fuse is blown those instructions are disabled another Another possibility could be to Ensure the debug access to sensitive information so they could be platform secrets like root keys or Firmware or something like that is blocked in production You can Have some sort of authentication done to use a debug interface like j tag Obviously you have to test that having to debug the debug authentication handshake is a real pain but that's Certainly a way that you could Add some security to this and sign cpu microcode updates are the common practice nowadays So There's some takeaways. I want to leave you with i'm I know that x86 cpu design may not be the project that you are about to go and start after You know midnight tonight, but I think a lot of the techniques here are hopefully still interesting and hopefully can be applied to other projects as well First off breaking down large designs into small chunks Seems somewhat obvious, but that's absolutely critical, especially when you're dealing with something that runs as slow as cpu does in simulation Using tools to get the most out of your test time is certainly important using coverage tools using formal tools Anything that you can get your hands on is very useful just for maximizing the Usefulness out of each compute cycle you're running But the biggest thing I would say is to just think about what the weaknesses in your testing flow are and Have some way of addressing those One thing you see with cpu designers is over the years as they work on multiple projects They know where the bugs are going to be even though they wrote the code They know they're still going to be bugs in there and they build the design accordingly So try to design for failures by building in as much into the original design as possible So building in debug features if you need to and securing those debug features Anticipating what the risk areas are going to be and you know with hardware you Never really have this option except with microcode of just telling your users to go and download a patch I mean if you actually had to fix A real hardware bug on the field you'd be shipping silicon to millions and millions of people in the world And that's not a position anyone wants to be in So you have to have these other features built into the parts so that you can address areas as they come up with software That sometimes can still be the case I'm not a software person so You know take that with a grain of salt, but I'm sure there are some situations where Just doing a software upgrade and asking users to download something may not be a practical solution for their environment And in those cases it's helpful to have some Way to deploy updates, especially critical ones if necessary So That is what I have here. I do have some additional links If people are interested in reading more about this the BIOS and kernel developers guide is a great resource it's Might be a thousand pages or more, but it's a fascinating read and it goes through Virtually every register that exists in x86 processors And and what those bits do the cpu revision guides are the erratic documents Both intel and amd publish them you need to make sure to find the one for your specific cpu version But they're very interesting to see Not only what the bugs are But you can also look at revision to revision and see kind of what bugs are so unimportant. They're never getting fixed And if you're interested in cpu verification, there's a lot of great resources There's some youtube links here To be honest, you can just google it. You'll find a lot of stuff It's it's an interesting field. There are plenty. There's plenty of work that's done in optimizing verification and doing more formal proofs and more circumstances doing Something called power aware verification is very important now when we're dealing with chips where parts of it will be powered down at different times And you can't use those parts without powering them up So there's a lot of interesting work being done in that space I would encourage anyone interested to go take a look at these And with that i'll take some questions One two Okay, thank you very much. David Um, can I please add that? No, I'm sorry. Would you please queue up on one and three? Something some glitch. I'm sorry Okay, first question from the room gentlemen under number three Uh, thanks of all first of all for the awesome talk Um, I don't know if you know anything about this or if this is outside your domain But i'm wondering how does the sort of layout aspect come into this? So if i'm programming in fpga, I just blast my my design in there and it's laid out automatically and everything just happens But rcpu is actually laid out manually or how does that look on the sort of analog side? Yeah, uh A bit of a bit of everything I would say There's been a It used to be that everything was laid out manually And that was because the tools at the time were simply not good at designs of that size Recently when I say recently I mean the last like probably five to ten years There has been much more of a push towards automated layout and automated synthesis and frankly the modern tools are really really good and What tends to happen is that you actually do best if you take Say an entire cpu and you just throw it at the tool and say figure out where stuff goes, right? And in some cases you don't even say here's where I want instruction fetch Here is where I want to multiply or things like that You just give it that and it has a good way of figuring out just based on the connections Where things need to be in relationship to each other. So it's I would say it's mostly automated but with some human input as well and You know, I think that uh, it's different from fpgas Because of the way it's built obviously with asics, but the tools Certainly have some similarities to them Okay, the gentleman number one So modern amd cpu at least phantom 2 consists of well phantom 2 doesn't include the third one Includes a lattice micro cpu for power management and of course an x86 core for general processing and The third core included in modern cpu says the Secure or whatever Do you test the whole package or just individual parts or? Yes so Typically the way it works is that Chip is built as a collection of what's called ip's and the x86 core is one ip The the power management controller is called the smu is one ip The security processor is called the psp platform security processor Is one ip and you have a whole bunch of others in there. You have memory controllers. You have southbridge, etc so those Do most of their verification on the rune But there is system level verification the thing that runs at one hertz that's done with all of those together and that Is more limited and because of the speed involved, but there certainly is verification done by on the whole piece um We'll have to include People outside do we have a question from the internet? yes, so where would Where would one actually set those? Chicken bits and can it be set in the operating system? so Yeah, so the chicken bits live in model specific registers, so they're set using the right msr command You need to be at ring zero to do that but That's it Okay gentlemen number three You I am very impressed with your talk because I'm a security guy and definitely not on the hardware guy But as a security guy, I see you I see this process that's having that's struggling with keeping out unintentional bugs So this process how susceptible is it to A person within your organization trying to introduce a really hard to detect Intentional bug Or how would you detect such a thing otherwise? right You know, I think that's probably an area. I can't talk too much about I'm trying to think if there's anything I can say about that I would say that You know, there There are a lot of different phases to the design process. There's a lot of eyes that see things Speaking as myself. I think it'd be Very difficult for someone to get something through kind of all the different checks and balances that there are but Beyond that There's not a whole lot. I can tell you one thing a little bit related to what you talked about is that There's also the whole piece of the fab right Let's say that your company produces a design that is perfect whatever It may not be the design that the fab sends you back And that's a whole issue called supply chain security Which is certainly something on our radar as well. It's you know, unfortunately, it can be very difficult to Well, let me say this when you get silicon back you tend to test for the features that you expect to be there It's very difficult to test for features that are there that you're not expecting Okay, thank you. The gentleman at number one is pre california Yeah, so we all know the pixel failure classes of our little monitors. We all bought Are there pixel fail? Are there failure classes for cpu's and I am able to give you a million dollars to get a better tested cpu Then I can buy on amazon No, they're not They're not failure classes anything like that the The cpu's are all functionally the same The you know, there can be differences depending on your bios version as to what Fixes are applied like what microcode version you're loading because unfortunately You know kind of like uh like trample was talking about Even if there is a bug and we release a Say a microcode update for something We can't force OEM to put in their bios. We can't force you to download it so that there's an issue there The the only thing I'll mention about kind of We call binning which is where you you test parts and you figure what bin they go into Is that when we do make parts There are not different speed grades or anything like that The way that you get a two gigahertz versus a 2.2 versus a 2.4 anything else Is just simply how it came out of the fab some parts just run faster than others Some parts burn more or less power than others So That one, you know, you have to get lucky if you Want to fast it like whether your part can run faster or not it is often just luck sorry We got something on the internet in in between Otherwise, all right. Um, so in in the verification stage, how do you distinguish design flaws from fabrication issues So in in the verification stage In the verification stage the design is not gone through fabrication yet. So you're just testing The verilog code There are mechanisms For testing when it comes back from the fab Whether the part was built correctly or not Sometimes, you know, that can be as simple as Reproducing the same bug on multiple different parts because chances are they weren't all Made incorrectly. There's a number of other features We build in that are under the giant category of design for test features that Functionally validate whether all the flip-flops and design are working things like that. That can be another talk Okay, gentlemen number three um, I would like to know How many of these design cycles is a typical x86 processor going through before it's finished So, uh, it varies significantly by how many new features, of course, were added in a particular generation I would say that sometimes it can be as little as one or two Sometimes it can be I would say between five and ten What one way of kind of tracking this is if you hear about things like a zero or b zero or b two parts the the first letter is Basically the base layer version and then the number is the metal layer version. So like a b two part is The second version of the base layer and the third version of the metal layer Gentleman at number one, please Forgive me for asking this, but i'm a security researcher So if you dramatically simplified the processor by removing all the legacy and other crap in it how Are you calling the designs crap? What are you calling the designs crap? No, no Say that but How would the testing change? so, uh There are a lot of legacy features in x86 one interesting thing is that It would not necessarily make things simpler and the reason is that If you take something out you have to take it out of all of your existing tests and out of all of your random test Generators and out of all of your models that check things and this sort of thing And believe it or not for some things that can end up being more work than it is to just put the darn thing in And test it again, which i know is a little counterintuitive, but that's the reality of it. Unfortunately It's really hard to take things out of x86 because there's so much software out there We finally got rid of 3d of 3d now And i think we're getting rid of a 20 which has been around since like the 286 so But yeah, it's it's a tough battle Gentleman at number three, please I actually have uh two questions The first one is uh, you mentioned that there's a piece of ram that you can program to modify Programming after the module has been manufactured Um Why burn the microcode into the silicon at all when there's a piece of ram that is big enough to hold the entire set of instructions So the the ram is not necessarily always big enough to hold the instructions the I would say the primary reason for building in well There's there's a few the first thing is that ram is much much smaller in silicon than than ram is So if you are not going to build a ram that's As big as everything could be then you are going to save area by putting the portion of it in ram There's also just a major security advantage to doing that You don't have to trust your loading process as much and you don't have to And of course that loading process if you didn't have any ram or need to be built completely in hardware Which means you got to get it right the first time which can be difficult So those tend to be the reasons it falls into ram and legacy um The second question was at what point in the design cycle do you decide which clock rate? the processor gets marketed under uh, or Like runs in typical operation, right? I mean it's typically part of the initial design specification that When you're going to create a design you you want to have a target performance envelope for that and Based on the process technology that tells you okay. You need to have this many gates per cycle or something like that Now of course when you actually fabricate the design you test it It you know it could be different. It depends on how good the early data was But it's typically part of you know kind of the day one specification There's a question from the internet in between I think Yes, so if the microcode updates are signed, what does the cpu check the signatures against so uh A typical implementation might have a public key burned into a rom That is used to check the signature, but I really can't go into too much detail on that Number one, please From the talk it seems the Walk up to the mic, please. Otherwise, we don't have you on tape. Yeah, that's good From from your talk, it shows that testing is a huge part of this process With cpus becoming more and more complex. How big will the impact be of testing? How much will it delay new cpu features? It's a major factor testing is the biggest issue with with cpus both in terms of the amount of time it takes The amount of people the amount of cost associated with it I mean things Things get slower as you add more stuff into it on the other hand sometimes new tools and emulation technologies will pop up to help mitigate some of that But You know when people look around they say well, why does it take so long for the cpu features to get into things? You know even when it comes to security features This is kind of the reason it it's a long process and Every new feature you add extends the time before you can start selling the device and you don't make money until you start doing that Okay Yeah, I said I know you guys don't like follow up questions, but maybe it's short answer Do you see any new testing methods in the on the horizon something more revolutionary than simulating or I can't say I do right now, but I'm not as much of an expert in kind of what's coming up in that field, but I think it's uh, There could be new stuff. Certainly that'd be helpful. Okay. Thank you Okay, thank you Number three, please Okay, um, I was wondering if we could able the amount of space needed on the silicon To implement the jtech device the chicken bits all the stuff Is it a percent or much much less? so the Chicken bits tend to be very very very minor because it tends to be kind of one register and a disable wire to some gate Jtag stuff I I couldn't quantify that Specifically, it's not that big I would say compared to the other stuff in cpu if you look at a diet photo of things Everything is small compared to the caches so You have that but you know a lot of this logic a lot of the jtech logic is not considered optional In some cases it's because an eye triple e spec requires it in some cases. It's because if you You can't debug the part then what what good is it at the end of the day? Okay, thank you gentlemen number one Um on the scale between the Full simulation and one end and then the hardware emulator in the middle and the silicon at the end Do you also use FPGAs where you basically partition the full system and stick them into FPGAs so There are some cases where that's useful the biggest challenge is FPGA capacity Most FPGAs are simply not big enough to handle designs like this and also Sometimes the the stuff in them Is designed for more of an ASIC flow and won't map into FPGAs as well Some of the emulator systems are based on very large FPGAs that are put together in something like that What I'd say with with xa6 cpus And I looked at it one time like trying to see you know, could you synthesize the sort of thing into a xilinx chip? and the answer was it was so Big I mean compared to even what the biggest xilinx chip was that there's just no way you could do it easily Not a single chip, but basically if you have a four core an octa core or whatever, yeah Split them and partition them over multiple That I've seen that work in some smaller designs not in anything as big as a core Okay, people if you're leaving before the talk ends, which is okay Can you please do it silently because there's still people asking questions? They're still can you sort of respect, you know, let's have some gentleman at number three, please Yeah, you told us about those in cpu debugging features Are they all left in in the final design or do you decide at some point? Oh, this scan dump thing is really expensive. We should remove it and now is the time to do it Well, you saw how expensive it was to modify hardware So maybe that answers your question Okay, gentlemen number one Any debug features for the timing so you can verify that Yeah, or that you can debug that where your clock does not hit your target if there's some things Uh To validate like if the part runs at the speed it's supposed to When it when it does not run at the speed where you expected to run which part is the fault then You got to figure that out I know so it's not an area that I've worked with personally I know that one thing that's sometimes used are lasers That you can if you shine a laser on a certain part It can heat that part up and make it run Um, I think of it faster or something like that And so you can use that to kind of help figure out where the slow path is in the design Because I mean the the simplest thing is it's supposed to run at three gigahertz You run it at two at three gigahertz and it doesn't work. So you run at 2.9. It does work and uh, you then kind of uh Have to figure out what circuit is causing the problem that happens sometimes typically it's not um Not a huge deal because the library is that you work with during the development process are very very good about Figuring out what the timing is of the different gates and making sure that you don't have any issues Okay, three more questions. Please keep them short and simple because we're out of time Number three, please. Um, yes, um, I wanted to ask your security specialist as I understood it Um, what's your job when you're designing a new cpu? Well, so my my current job is actually working On security features for the amd roadmap both including cpu features as well as the platform security processor and In that capacity We work with the different teams that are involved in in those security features to help them Develop the specifications and make sure that they're testing all the cases there that are necessary But I don't get to write code anymore Okay, did we leave out the internet or is there no more questions on the internet, which is okay Okay, gentlemen at number three, please Um, regarding to the fact that clock speed is a specification day one fact How likely is it to reduce the clock rate? for a fix in order to not have to pay the Three million mask process again and reduce it for for marketing In order to fix the system design that otherwise would not run on the target clock rate So all I'd say is that's a business decision? There you go. Yeah Okay, last question number one You were talking about formal verification And you said one of the issues is that you need to get a model to check against the specification So how do you do that in normal testing? How do you make sure that you're actually testing against the specification? Because I was where the issue is in the early smp days where actually there was no memory model and All cpu's were not doing anything useful Right, I mean the the functional checking is typically done with kind of a golden model Where you know say you put instruction in and you have the registers that need to be there and everything else The issue with formal verification Is that if you're going to apply it to a design size that the tools can work with say You're going to apply it to a scheduling unit or to a load store unit Those blocks are have hundreds of different ios that talk to other blocks You basically need to have a formal model not of the architecture But of those blocks and exactly how they behave which Can sometimes just turn into re-implementing those blocks So it can it can be a lot of work for that Okay, thank you and let's have a final hand for David Kaplan. Please