 Good morning Good morning. I Just wanted to make sure you were there so today, I'm really honored and excited to be introducing our first keynote and This is a very a very special person for people in our business Back in well, let's not get into details, but a while back In in the course of a week this person Designed the bulk of both the hardware and the operating system for the famed BBC micro computer To which a generation I would say in the UK. Oh, they're beginnings I certainly know of one person here with us who was started on the BBC micro and then after that she went on to create the arm chip set Basically, and that is what of course is powering 95% of the devices that geeks like us carry around So this is is someone whose contributions are really hard to quantify in our business But she is here to share some of her insights on the future of microprocessors and She's got a lot of stuff to tell us So I am not going to go through the long list of honours and the many achievements instead We're going to directly welcome Sophie Wilson to the Euro Python stage. Please a big welcome. Thank you. No me So we'll be cantering through If you really really really have to ask a question then wave at me, but otherwise questions in the break So we'll be talking about microprocessors so About 40 years since it was introduced and in that time 10,000 times faster billions of times more common and the world now relies on them, especially in a pandemic We'll look at how we got here and what the future might be. There'll be a couple of laws and quite a few graphs But first, what is a microprocessor a? Microprocessor is composed of digital logic carrying out multiple steps to execute each instruction So we've got to fetch an instruction from memory Decode the instruction find out what on earth to do Read the values required for the instruction So for example fetch some values from registers inside the processor And then we've got to carry out the desired computation So if it was an ad instruction, we got to add two values together And then we got to write the result back into the register file of the processor So everything's broken into the steps like that or smaller and each piece of digital logic in those steps is made out of many Many transistors so better transistors or more transistors means better microprocessors Which takes us to the first law So Gordon Moore at Intel Made an observation that If he kept watching every six months Somebody would make transistors better on smaller silicon and he said That the number of transistors he could fit on a piece of silicon doubled every well it was adjusted later every two years And for a long time this has been a Prophecy that has always been fulfilled so The observation was taken as the driving force for the development of new silicon manufacturing first by the ITRS and now by the IRDS and You know their mission is to harness the world's Semiconductor engineers to make Gordon Moore's law as true as possible So you often see abstract pictures showing things with lots more processes on them as you go through time But what does it actually mean? So in this picture on the wall behind me is a plot of arm one Which was designed in a three-micron process and then to the same physical scale is a plot of arm Cortex M zero plus in a modern 20 nanometer process. It's that tiny black dot So that's a scale change of seventy thousand times in area. That's what's happened We can put the same complexity So we could either have 70,000 of the same thing or we could spend 70,000 times as many transistors In the same area as an arm one So how many transistors do you need for a working microprocessor? This is the circuit diagram of a 6502 on the right Topologically distorted to match the final chip layout on the left. So colors in this are You know, if you if you've ever seen a silicon chip, it's just this dull slug of colorless nothing But if we shine very bright lights on the chip, we can get different effects So this is green color is produced by diffraction So this is the main instruction decoder of the 6502 and it's a grid of metal So we get diffraction of the light to produce green Here there's no metal sitting on top of the silicon the light went into the silicon came back out again came out as yellow and Then we get different forms of diffraction for the reds There's a whole subculture of Shining bright lights on chips to take pretty pictures So you can see this is quite messy the 6502 was designed and laid out entirely by hand somebody sat next to giant Drawing boards sticking bits of ruby-lith tape onto transparencies to make the masks for this in fact several electronic engineers and a bunch of high school graduates Satting bungalows in Phoenix, Arizona Sticking ruby-lith tape to make that from the circuit diagram So You know 4,000 transistors you can do it by hand it's painful, but you can do it What do you get you get something that ran at about a megahertz those transistors were Fastish you get 8-bit operations you simply you're 4,000 transistors. It's not enough to have really complicated operations It was enough for very limited form of pipelining so they took two clocks to read in an 8-bit instruction and an 8-bit immediate and It took two cycles to execute it, which is quite good for the time. It was a giant 21 square millimeters in size and We'd characterize it nowadays as six micron smallest feature size So smallest feature size does not mean a transistor is six microns across They were hundreds of microns across. I mean you could see them Transistors are big for this The block diagram of a 6502 is on the right Single lines are single digital communications fat lines are in this case 8-bit buses So mostly 8-bit buses are going around if you execute an instruction you're reading something into the ALU Possibly from the accumulator or over the buses and then right in the result Moving things back. We don't have very many registers and Those that we do are only 8 bits. So I've got an 8-bit accumulator 8-bit index registers and a Precious 16 bit no 8-bit stack pointer even there even the stack pointers 8 bits So Gordon Moore's handle is wound. We get some more transistors We get to have much more fun as we get more transistors So this is arm one on the left is Steve Ferber's original pencil sketch of how arm one would be laid out so you have to choose where things go on The right is something that really looks very different from the shot of the 6502 This is on one. This was designed by computers driven by humans so We'll see the next chip will be designed in a different way by computers But this was a tickle script to take a register cell that had been designed by hand and Replicate it all over the register file. So there's 32 bits of register down and 20-odd registers across and The little bit there is the program counter so a Big thing has changed in that you can make out really coherent features on the 6502 shot that big green area of The instruction decoder that shrunk. This is the instruction decoder of an arm So something's happened in the development of microprocessors to make that possible And obviously that thing was risk, but anyway, so what did we get with these transistors? We got something that ran faster eight megahertz So we've got better transistors that ran faster But we'd also been able to use more transistors to make it run faster So this thing is fully pipelined each cycle it can fetch an instruction and carry it out continuously So it can do a fetch decode and read registers Execute and write registers a three cycle pipeline with this number of transistors It's a little bit larger than a 6502 and was designed in a process That's twice as good but that twice as good in both directions that's four times as many transistors per unit area and We get a very similar picture of a microprocessor, but now the buses are all 32 bits We've got a much bigger register file and a bigger ALU, but yeah fundamentally, it's pretty much the same so Gordon Moore's handle is wound again and Now we get a smaller process What we can we do with six million transistors to build a processor? And I'm talking just about the transistors in the processor. This is excluding any caches or anything like that So the first thing is the pitch has got a bit fuzzy Transistors have got really really small you can still see some features these yellow areas are the register file of fire path There's a thing down the middle and the processor looks pretty symmetrical with a top and bottom part But transistors are so small that you get mush where the computation elements are where you could see clearly on the 6502 So we're getting better faster transistors and we can burn a lot of them to go fast So suddenly now we're running at 330 megahertz We've got Six million transistors so we can have complicated instructions. This machine executes for 64 bit operations per cycle or in this particular case You can do 32 8 bit operations per cycle or this is What is that that's 8 32 bit operations per cycle and it can do that Per cycle in and it can fully sustain doing that compound instruction So this says add this register pair to this register pair and put the result there While loading a red two lots of register pairs In here and keeping going so everything in this machine is executing pack SIMD It's designed to do multiplier accumulates. We burnt a lot of transistors doing multiplier accumulates It's tiny seven square millimeters and a hundred and thirty nanometer process Means lots of transistors fitted into a small area It's actually pretty complicated design instructions much more complicated than this It seems to be some natural limit of how much you can get the compiler team to deal with So as you use more, you know Gordon Moore's handle is going to give us more transistors So what do we do with them? Well, we put down multiple processes So instead of getting Computers with much more complicated instruction sets. What we get is multiple microprocessors So here's the fire path that we started with and Here's its reflection the other way round and then we're burning lots of space with Giant caches and on-chip memory and we've got a little logic iobuffers and that thing is 16 channels of DSL to your home and We can keep doing this I've made a career out of doing this So if we've got smaller transistors that means my more microprocessors So this is more fire paths. There's a fire path. There's a fire path. There's a fire path There's a fire path blocks of memory all over Blocks memory down the middle IO systems are shrinking compared with the fire paths so this thing is What's that? That's 12 channels of VDS L2 to your home So I can keep doing this all day the current chips in those green cabinets bringing DSL to your home Have 12 fire paths, but I'm working in a data parallel world where my systems can be replicated If you don't have That property and can do computation in pure parallel Then you run into a real law with a real equation and a real graph Gene amdell observed at the oops Yeah, the speed up of multiple processes is limited by the sequential part of the program. So he's got an equation Ray, it's a real law. You can't break this law so if you've got a single program to execute and It's got a sequential part and a parallel part and 95% of your program is parallel Then you're in the green curve And the parallel portion means that even if I supply an infinite number of microprocessors The speed up is limited to 20x of executing on one microprocessor So that's pretty serious 95% parallel is pretty much your ray tracing sort of program where there's a lot of parallelism Most of the programs you work with are not that parallel So if I merely have 90% parallel I get 10x speed up If I have 75% parallel Which is about where web browsers are then even with an infinite number of pro processors. I only get 4x speed up Which is pretty sad and a compiler. They're usually about 50% parallel Oh dear. They only get a 2x speed up of my 65,536 processors That's a shame Because the industry can and will Continue to make you hardware that has ever increasing amounts of parallel computation And not merely multi processors, but will give you all sorts of SIMD data types vector processing engines Matrix processing engines all sorts of additions, which you can't make use of Traditional scalar computation will not increase very much going forwards. Indeed. It hasn't increased very much going backwards since about 2006 we've hit Sort of peak performance per clock and peak clock rates It has become much more power efficient. We can give you many more processors without burning your hair down But scalar programming languages such as Python are a very poor fit to the parallel hardware and There's no automatic compilation of scalar programs that you write to parallel hardware for all problem types We can do it if we try very hard for certain problem types, but mostly in a work So you guys out there. We need a revolution in software We need a better way to program multi-core processors off you go I've been presenting this slide set for nearly 15 years No change So although we keep getting more transistors More transistors aren't actually as useful as they were so we had a time when Processors were young and we were increasing performance about 25% a year then we had the risk resolution and a thing called Dennard scaling a much better laws and Moore's law is Dennard scaling Dennard scaling says that as I make the transistors smaller. I reduce the operating voltage and consume Exponential amounts less power So that was really good So we had a nice Dennard scaling era of performance going up rapidly Then we hit the end of Dennard scaling which was a bit of a problem And then we went to out of order architectures to push up performance by 23% a year which is only like that and Then we've started running into Amdahl's law where adding more processors really doesn't help and Now it's getting really really hard and we're doing stupid things to increase scalar processing performance stupid stupid things There isn't time So why why is this so exquisitely painful? Well, first thing is transistors do get more power efficient Even without Dennard scaling Building smaller transistors which have less capacitance Means that the power for a given transition on a transistor is less But we use so many more of them in a small space things get hot This slide set is not meant to poke a lot of fun at Intel, but this is derived by looking at Intel chips So here we go. So they started with the 386 486 the world's first parallel Sorry, the world first super scalar out of order processor Then they built the Pentium the Pentium Pro the Pentium to the Pentium 3. Oh, look hot plate Amount of energy per square centimeter of the processor means you can fry eggs on it. I mean the Pentium Pro Pentium 2 you can fry an egg on the processor It have to be a very small egg to sit on the one square millimeter But you can fry an egg on it. It's got hot They kept going with the same micro architecture Pentium 2 Pentium 3 Pentium 4 Oh, they didn't quite make it to nuclear reactor levels of power density It got very difficult to conquer to cool. So Pentium 4 the infamous Nehalem They had to essentially throw this micro architecture away go back to first principles and start again to build the modern core micro architecture and Stop being so stupidly hot. Which is a shame because we never got to the rocket nozzle So and those law really hurts, but power is also constraining the future So the power used by transistors isn't decreasing as fast as the size reductions of the transistors even assuming multigate finfets The fin is a real fin. It sticks up As we go forward Increasing amounts of the silicon that you pay your hard-end money for is going to sit being dark Unpowered not used. So you've got this lovely smartphone. It's got a lot of computation engines in it Both visible ones in terms of multiple cores that you could theoretically write a parallel program across but also other engines neural engines GPUs and so on and To make it not burn a hole in your pocket. We turn all of that off as much as we possibly can So Stuff is unpowered you used to stuff being unpowered in your pocket. Otherwise your batteries were done down but now we're having to do unpowered on Desktop computers as well. If you've got a desktop computer the sort of thing that keeps your office warm It's chucking out hundreds of watts of energy as a by-product to doing all this computation but if we assume 125 watt power limit for Convenient silent air cooling then for seven nanometer processors We have to turn half of the processor off all the time. Otherwise it will be too hot and And going forward if we 3d stack our transistors then the cooling is going to be a problem power is really really bad Now this sounds quite depressing But it's worse There's no immediate physical limit to further scaling from today's processors. We know that there's no demon There's no engineer coming in saying the engines canotapic captain The engines can take it We could we're building lots of chips in 28 that's a volume process for all the cheap chips 20 we've got seven nanometer chips from Intel Huawei Qualcomm AMD and Apple have five in production Nvidia Huawei Samsung also have some fives and We've got four and three in development But for the first time in history as we make transistors smaller they cost more So we used to make transistors smaller. We use less silicon Sure, the process got a bit more complicated But overall the cost per transistor kept going down now the cost per transistor is going up You'll have noticed your shiny devices cost more. This is why They're more complicated to design which hurts, but they're more complicated to manufacture the transistor cost goes up So this is gonna hurt. This is sort of the end of Moore's law Only some things will be worth the greater expense of small process geometries. And I think you can already see that Releases of new chips in new process geometries is now confined just to the majors the minor people I work for Broadcom where the largest fabulous semiconductor manufacturer in the world and We can't afford some of these exotic processes because our run of chips will be too small to make it pay We can't afford to do it for some things, but most of our chips. No Now It's time to talk about all these little asterisks that I've been putting on the slides So when we moved from playing our transistors a planar transistor is a flat one So we put a source and a drain next to each other and we put a gate between them Applying a charge to the gate affects how electrons flow between the source and the drain That's a field effect transistor. That's the transistor that one There were loads of other transistor designs, but everything now is made out of field effect transistors so As we scaled down a Plainer transistor and everything got small things started not to work very well this thing called the gates contact area Where the field effect of the gate affects the slow flow of electrons got too small to make good transistors. So around about 20 nanometers We've got very leaky transistors that don't turn on or off properly because the gates too small so What we did as a semiconductor industry was to make our transistor vertical So we took the source and the drain and we made them like that and then we wrapped the gate around them So that's the fin in FinFET. It's a vertical fin and we've increased the gate contact area massively and made a better transistor now as We did that conversion from planar to FinFET That gave us more transistors per square millimeter. We're in the same process. We just had some ways of going up So we didn't change our lithography. We didn't change anything Metal tracks are still exactly the same as they were the transistors got smaller So the industry made up a new name So if we took a 28 nanometer process and made FinFET on it Which was the first ever production of FinFET by Intel They called it 22. It was a 28 nanometer process and They called it 22 If we took a 20 nanometer process with a FinFET people called it 14 or 16 and we've had 12 10 8 7 6 5 4 and 3 in FinFET Intel used to have a different name to the foundry companies making chips But they decided when Pat Geltin and I became boss of Intel to join in With everybody else and use the same naming convention because it was getting very confusing So they all make up names the same way But they're all made up there is nothing five nanometer about an M2 Mac silicon It's made on a five nanometer process they say but nothing in there is as small as five nanometer whereas before if you talk about 130 nanometer there were things in there that were identifiably 130 nanometers in size So that that's that's a little Diversion because I needed to explain some time now back to economic problems It's getting really expensive to do this to maintains Moore's law It takes about 18 scientists 18 times as many scientists per Moore's law step as it did in the 1970s It's getting quite hard If you look at this another way, it means that each researchers output is 18 times less effective In terms of generating economic value than it was several decades ago That means a shrinking pool of scientists because you can't afford to pay that as many and it's going forward You can draw graphs about this on an annual basis research productivity in the semiconductor industry is declining at a rate of about 68% per year You could say we're running out of ideas So we're now going to look forward So my health warning on looking forward predictions are very hard to make especially out about the future in April 2002 the head of Intel Predicted that we'd have 30 gigahertz 10 billion transistor chips by 2010 You will have noticed that he was wrong and We've seen some of the reasons why that didn't happen There was a sort of cap It's pretty hard to make stuff that goes at five gigahertz and stick to any power limits now I don't like to beat up on Intel, but they do make very optimistic statements and then contradict them so back in 2010 The date that at which they predicted that they'd have those very fast chips They gave a presentation about their foundry skills Intel like to claim they have the best foundry in the world So they said we've got a We've got a lot of innovation coming. We've got a technology pipeline. It's full Sure, we've got our first FinFET devices going forwards We'll we'll make that about 2011 and then we know how to make 15 11 and 8 and they'll come out in about 2013 2015 2017 and Going forward. We've got all sorts of things that we'll do We'll make carbon nanotube FETS We'll make ultra dense SRAM with new processors and so on So they knew how to do all of that. I said This is what actually happened So the the one year ahead looked to 22 nanometers. They missed by a year. Oops. It came out in 2012 not 2011 14 came out in 2015 not 2013 10 they didn't make till 2019 four years out Seven well, I've put in TBD on the slide Seven is just about coming out with Alder Lake Alder Lake's the first thing they did have to redefine what seven meant Because Alder Lake was built in the enhanced superfin plus plus plus Process which in hand superfin you'll have noticed was the name attached to 10 So they did actually have to redefine what seven was in order to even get Alder Lake out on seven So yeah, this this prediction stuff is tricky Oops Now it's also getting ruthlessly expensive So when I designed that fire path that was built on 130 I could go to 22 different companies to make it in theory at least at the time Intel wouldn't have offered fabspace to me But 22 companies Then the number of companies for each leading edge process kept shrinking And now we're down to three companies left who can make leading edge processes And we've gone from a fab costing a few hundred millions of dollars To fabs that are just kicking out a fab costs so much And developing the fab process and the fab investment that needed to maintain going forwards You know even for this sort of levels three and a half billion you're looking at 16 20 billion fabs You can run a small country for that So what do these three leading companies make and what who are the other people? So leading companies are intel samsung tsmc. They're the only people on leading edge You'll notice that two of them aren't Um western american type of things Korean and taiwanese So intel make seven nanometer Which was formerly called 10 nanometer plus plus plus Samsung they're in volume on four. They've introduced a three nanometer gate all around thing But it's only suitable for prototyping at the moment Tsmc the volumes on five and four And in the flows of introducing three nanometer apple among others Are widely expected to have run multiple trials on three nanometer apple silicon chips And then global foundries they kept pace all the way along till about 12 when they've threw in the towel And sort of almost went backwards Tsmc semiconductor manufacturing in china. No, it's not called that. It's just an easy way to remember who they are They're they're sort of stuck on the 10 nanometer generation United stuck on the 22 nanometer planar generation and the 14 nanometer finfet So it really is one of these three if you want an advanced chip Translating it into numbers So Numbers of the great leveler everybody's using the same lithography we'll talk more about that in a moment If you look at the names of things and how many transistors per square millimeter, we can work out how honest people are being so Back in 2017 Tsmc's n10 and samsung's 10. They're pretty close together at 32 At 52 million transistors per square millimeter Intel were in 14 plus plus at the time And that was much worse 37 million transistors per square millimeter It's not necessarily the only metric You can't use this number of transistors per square millimeter to make a processor Intel's Transistors they biased them to make them fast Um So in 2018 Tsmc had gone to seven Samson was still on 10 and intel was still stuck in the same place 2019 everybody's about 100 million but with different names. You can see why intel had to change their names 2020 everybody's About 170 120 100 intel behind mid 2023 where we are now Uh tsmc can put about 300 million transistors per square millimeter. That's a big step from back there Intel about that on intel 7 so Intel still quite a lot behind samsung haven't published enough information for us to compute that number for them She's a bit of a shame But they're probably about 250 if I had to guess So how do we make a a A transistor a processor we use a thing called lithography. So basically printing metal You can think of it as a photocopier So we have a master copy that's stuck at the top of the photocopier And then we can make copies of that instead of paper We're making the photocopier of the leather mask onto a piece of silicon So we have machines that are very much like a photocopier and they were about the size of an office photocopier back here Um, they were using I think back there. They were using optical light And then they started using more exotic forms of light Heading towards machines that were getting quite big Using depot of violet light. So that's a wavelength of about 193 nanometers You'll have noticed 193 nanometers is a lot bigger than casual talk about 20s and 28s So the wavelength of light is a fundamental limiter of physics for printing stuff Um, so we had to be Clever and get round that Um, so we use multiple masks to print each layer where the masks Um are specially arranged In order to print features smaller than the wavelength of light that we're printing with That was quite expensive to develop too So Up here we're using Um multiple exposures Um So we're usually using quad exposures. So for each layer that we actually want to print we need four different masks Aligned precisely To print a layer that's at a wavelength smaller So the pressure up here to get beyond depot of violet to extreme ultraviolet was major Depot of violet was fairly simple technology But extreme ultraviolet 13 nanometer wavelength light Is arcane technology So here's a schematic of an extreme ultraviolet machine. How do we make extreme ultraviolet light? Well, the answer is we go to the cinema and watch star wars movies So under the fab floor we build a megawatt co2 laser We have a droplet generator It is fed some molten tin and drops little droplets of molten tin into a vacuum We fire the megawatt laser and vaporize the molten tin in the vacuum Big flash of actinic light you've all seen the movies And some of that light is 13 nanometer in wavelength So we collimate in the vacuum to capture all the wavelengths that aren't correct And let a beam of 13 nanometer light come out at the end. All of this has to be in a vacuum the whole thing And then we can expose with a scanner mask Our mask has to be a reflection thing because glass is A peak to 13 nanometer light So we moved from transmissive optics to reflective optics along the way So this machine Is now beginning to be used in fabs to print small things. It's 30 nanometer You'll notice we're claiming that we're doing seven nanometer But we can reduce the number of multiple patterns for things dramatically Now this machine is expensive 160 million a piece for the for the first generation ones the new generations cost more It's flown around the world in specially constructed jumbo jets And it's bigger than a photocopier So here's that machine on the left and there's a couple of people Standing next to it So that's one fancy photocopier And the next generation is larger again So there are some conclusions from all of this So we can keep going forward. We can have heterogeneous processors in your system graphics processors tons of work for processor designers. Yay And system on chip designers. Yay And even more work for software people So even so with performance related to parallel or special purpose processors programs only Users of computers will have to adjust your expectations I lived through an era where Every couple of years you could buy a new computer that was many times faster than the old one It was worth throwing them away Now well I threw away a lot of computers to transition to apple silicon, but The old ones from 2006. They're working just fine. They still run operating systems properly and programs There are only two gigahertz, but the difference between two gigahertz and the sort of three Gigahertz that you get average out of a processor When it isn't bursting up to five gigahertz and making the room it very hot There isn't much difference nowadays And they are costing more these shiny computers And we can see this if we look at a trend chart of performance So the transistor line we can use lots of transistors Single threaded performance. This is spec int Um, just levels off between 2010 and 2020 nothing much happens Frequences they're basically nailed to the spot and have been since about 2006. We can't afford the power There's the power curve Flat logical number of number of logical cores. Well, it was one for a very long time and then it went bonkers Can have as many cores as you like Um, so what what happens going forwards we can use more transistors per processor than we do today um So leading designs were were in super scalar out of order processors with six to eight operations peak per cycle And that's true across a number of providers So intel got to this point first with a haswell processor There are about six operations per cycle then broadwell sky lake loads of lakes to coffee lake and elder lake where we are now And a processor called golden cove is under those AMD zen 2 zen 3 zen 4 apple twister hurricane monsoon vortex firestorm and now avalanche Um, they're there in this ballpark Super scalar out of order It isn't particularly energy efficient. We waste a lot of Um energy in the framework of computation But boy, can we make it run fast um, so every everything's Doing that and apart from apple who who stay below four gigahertz, but apple The place where you know firestorm and avalanche they really are doing eight operations peak per cycle It's quite an impressive micro architecture in there We can burn transistors um By having big and little calls arm were the first people to invent this Um, they've renamed their thing. It's now called dynamic Um, so if if we've got big super scalar out of order cause that are inefficient, we can have little cause Um, that have exactly the same micro architecture and have the operating system swap the work between them to work for power efficiency Yeah, everything about this is for power power efficiency apple well Lightning firestorm avalanche they're very big cores And thunder ice storm and blizzard the power efficient cores They're about middle sized um arms classic a 55 Or a 520 core is quite small compared with that And intel with golden cove now have a gracement core Which is their efficiency core and that's also middle sized. It's actually comparable to a haswell is gracement But built for a lower performance point So we get lots of extra transistors adding little cores is very cheap So you get more and more little cores in your systems And you can have any old architecture like You can even have a prime core where you've Spent the power budget To go really fast some big cores and some little cores. So that's quite fun What we compute is also changing So we've had general purpose computing then we had signal processors graphics processors now we have deep learning engines variously ip use digital neural networks And these are different eight or 16 bits or smaller processing that can use integer rather than floating point massively parallel This is a special field We can write massively parallel software for neural networks Particularly things that do convolutional neural networks A convolution you can express as a matrix multiply And we can do matrix multiplies very efficiently in hardware So the google tpu has 64,000 8 bit integer multiplies per cycle So we can really use parallelism in this field We can also make it highly power efficient Where say an intel big core is running us about five watts For let's be generous about 30 or 40 giga ops With machine learning we can do three to five terra ops per watt So more power efficient by a long way we can be even more power efficient with the thing called spiking neural networks, which is the Hardware realization of the process in your brain So we have a new performance race. You'll remember the microprocessor curve has gone like that And it never really did go exponentially up Here we are in the performance race. This slide defines why we have chat gpt now We can Have gains from number representation getting smaller much more complex instructions with matrix accelerators And smaller process But as you've seen for the processors smaller process doesn't buy you very much But we can have Yeah, look back at 2012 2013 We had about 3.9 tops and were up at the For 4,000 tops and now on a single chip performance. So we've gone a thousand x up in 10 years of hardware development Um, there's reason to think this isn't going this exponential Exponentials never continue forever, but we can push it up a bit more So what do you get when you buy your shiny thing? This is n1 max um being projected by an ordinary m1 So heterogeneous processors this thing is crammed with processors 32 cores of gpu's 8x big cores 2x little cores Apple will be adding more little cores. I predict Into the future There's a neural network accelerator over here Which is 11 tops, you know a 50 watt desktop chip With all that in it all running. So we really do have power efficiency for processors and You know, we're close to the 240 watt desktop chip in performance with this sort of architecture Apple have invested a lot in memory There's a computer science law that you can never have too much memory bandwidth and they've spent a lot of money on it Intel Bad to beat them over again There's their historic Stuff they're saying. Well, we'll we'll improve our seven intel seven is going to improve so Whichever there is the one after all the lake comes out I think it's is it rocket lake refresh whatever comes out on seven plus plus So yes, it will get better honest Um, now I mentioned that finfets were needed to mend Plainer transistors Well finfets are now running out of steam. We've been doing finfets for for several generations And now we're going to move into these things. These are called gate all around. So we had a fin With a gate wrapped around it But now we can wrap the gate all the way around the channel in the middle Um to increase the gate contact area again and also putting multi gate transistors. So this is three sets of gates around a replicated three-way channel Um, and they'll probably look more like this because it's hard to make them like that But they're essentially the same sort of thing and just to Ram home the idea the nano sheet FETs that are coming and samsung are running a bit early on making nano sheet FETs Um, nothing in here is two nanometers. So this is an IBM two nanometer nano sheet transistor We're looking at a cross section through it um, so you can see the little sheet of A channel with the gates wrapped around them So each nano sheet is 40 nanometers wide for a two nanometer process It's five nanometers tall and it goes into the diagram Well, this is actually a scanning electron microscope picture goes backwards into the picture by about 12 nanometers for the smallest thing that they can make Of course 12 nanometers is only 24 atoms of silicon So that gives gets us to 330 million transistors per square millimeter and it's coming soon, but probably no earlier than 2025-2026 Uh skip over that side. I haven't got time Um, so scientists aren't doing very well scientists and engineers Um, so where do we spend our Research budget if spending it on conventional routes forward isn't working well We're spending it on packaging and advanced packaging so This is looking at the patents Patents on semiconductor devices Patents on packaging going up rapidly patents on advanced packaging going up even more rapidly Um, so particularly if you look at tsmc their patents going up quite rapidly samsung Intel So packaging things together the whole making of chips in 2d or 3d so This is um m1 ultra joined two m1 maxes together with a silicon interposer and on dye On carrier memory This is zen core and multiple chiplets so amd make this in uh 14 and this in 7 originally they've they've moved them all forwards now so Concentrate where you spend your money on the best transistors And we can go on top of each other so Here's the amd x3d with memory chips on top of processor chips And here's graph cores colossus. So you've got the colossus die. They can't get the power across it. Colossus is Extremely greedy for power So they've built a chip on top of it upside down to Transmit the power better all the way across it um We're looking at ways to core chips directly So at present we call the outside of the package um If we can inject a working fluid not water a working fluid Um, it's usually a mineral oil Then we can push things directly to the die with pressure And take it out And you you can you can have inlets and outlets mixed so that Tiny little circulations are happening. I don't understand how we can Uh have more than two layers of 3d and cool it like this um and Well machine learning we're spending a lot of money on making machine learning better And that makes us the biggest chip ever manufactured 46,225 square millimeters of silicon 1.2 trillion transistors 400,000 optimized cores Just on this wafer 18 gigabytes of on-chip memory 9 petabytes of memory bandwidth. You can never have too much memory bandwidth 100 petabit per second fabric bandwidth and that was built in tsmc16 You'll have noticed that we were past tsmc16. It's not leading edge anymore So can we do better with a modern process? We can So they've built generation 2 um More than twice as many ai cores um same area More than twice as many transistors More than twice as much onboard sram more than twice as much memory bandwidth and fabric bandwidth Oh the power they managed to keep the power the same So this thing 23 kilowatts of energy goes into it to make it compute So you you can't turn it on 23 kilowatts of heat will come out of this 46,000 So that that shows how the hot plate stuff is real You have to attach this wafer to a cold plate and put it in a special machine to even turn it on But boy, do you get some computation out of it? So that's what things look like going forward Thank you