 Hey folks, we are back live. It's T Tuesday update time. I'm gonna keep the livestream going for an hour today and I think in general it's gonna be the T Tuesday update hour. Hopefully the presentation will only be 20 minutes or 25 minutes whatever it happens to be but I will hang around if people want to chat or if they show up late whatever as we can make time. We'll make it like an actual sort of a regular talk on if nobody's around I'll just work on getting ready the video ready to upload or whatever. I've got a ton of stuff to talk about today. It's gonna take a while. It all started when Ben Krullus came into the T2 chat room a few days after the previous update and linked us to this video of the Tesla AI Day livestream that had very interesting stuff in it and it led me down to this rabbit hole that you know number one it turned out that as far as computer architecture goes Intel had their annual architecture day also on Friday August 19th the same damn day as the Tesla AI Day and it led me to thinking about scalability and computer architecture and different approaches to it different philosophies to it and I want to unpack that a little bit so this is going to be a different kind of update because it's rather than looking into the T2 tile it's the T2 tile project sort of looking out at the world and it's gonna take a little while hopefully it won't be too boring alright let's just go so here it is the idea is that you know I've got this indefinite scalability idea which I'll talk about in a second and indefinite scalability strictly is really hard to achieve and even the T2 tile you know has some marginal issues to it when you really dig into it strictly but even if you can't achieve strict indefinite scalability we can ask well how close do you come and that's the indefinite scalability audit that I'm going to do today for the first time and I'm going to play it to the Intel and stuff that was reported in their architecture day in Tesla's AI day and let's just go alright a reminder about indefinite scalability or for folks who are new the idea is suppose you're supposed to build a computer and you are given power cooling and space real estate and that's it and you're supposed to design a computer architecture a blue print for building a computer that can get bigger and bigger and bigger that you can add more and more and more to it and be able to compute with all of that stuff without ever running into some internal limit like 640k or running out of pins or something like that that's indefinite scalability how do you do it you define a tile that's some fixed size fixed mass you cannot change the tile once you started building the machine that's the rules of strict indefinite scalability and you put the tiles down next to each other and you see how far you can go and strict indefinite scalability is really hard because typically once you start putting down a ton of little things you want to you know put them in a box or put something else on it but then that box is a new layer of structure so in order to do strict indefinite scalability the first thing you have to do is you have to find out where is the part that actually repeats absolutely without change without adding any extra stuff and the repeating part that is the tile even if you have littler things inside it that you want to call tiles if you have additional stuff that breaks the symmetry so you can't just plop them down the tile goes up to the level of wherever that is so even if we can't be completely strictly indefinitely scalable we can look at any actual system or any plan system and say how is it doing where is strong in scalability where is it falling down and we can use that to compare systems so we're going to compare an Intel system with a Tesla system right now okay let's start with Intel you know in my experience there's kind of two kinds of engineers you know there's kind of smooth engineers and nerd engineers and this is Pat Gelsinger he's the you know recently new CEO of Intel and he's a smooth engineer you know and so he did the intro and you know I I have mixed a history with Intel you know for the longest time they were the 800 pound gorilla in computing and they pretty much ruled the roost but they have fallen on some hard times and so sufficiently that you know I'm kind of rooting for him now I want him to make a comeback you know you're just in the slump not least because they have a significant investment in New Mexico which is where I'm broadcasting from where I live but basically you know Intel one of the few places that can actually make these chips you know not just ask other people to make them they actually build them they you know the way that you make money and hardware is you by making your transistors your individual little pieces of silicon individual pieces of electronic circuitry smaller and smaller and smaller which allows you to put more of them in the same place and use less power and just do everything good and so that's what drove famous Moore's law that's why you have smartphones that are smarter than you know all the computers put together in the 1950s and so on but Intel fell behind they were stuck at the 10 nanometer note node it's called node each size scale they call it a node and you know 10 nanometers is unbelievably tiny but TSMC the Taiwan Semi conductor manufacturer corporation was already going down to seven nanometers and Samsung getting ready to go to seven nanometers and so forth and Intel was kind of stuck they had some internal process problems I don't know anything about the details but they did what Intel kind of used to do when they were the 800 pound gorilla which is just try to change the names of everything so they say you know you know these names and are just numbers so this is a quote from the TSMC guys saying you know these 10 nanometers 7 nanometers they aren't real numbers they're just like model numbers on cars so Intel says well you know let's introduce new node naming and so you know 10 nanometers whatever you knew they were at and now we've got Intel 7 and so forth and Intel 7 previously referred to as insample superfin which we previously referred to as 10 nanometers Intel 4 previously referred to as the artist formerly known as 7 nanometers and so forth so okay they're you know kind of you know sort of misleading names Intel 7 but they're just model numbers they're not meant to be actual sizes of technology well except what's that 20a at the end well the 20a is supposed to stand for 20 angstroms and angstroms is a measure of length and angstrom is a tenth of a nanometer so 20 angstroms is two nanometers which is indeed what makes sense it will come after the Intel 3 if Intel 3 was a length they're trying to have it both ways yeah okay whatever but so we can have a go now Friday on Intel Architecture Day they presented a ton of new stuff and I'm only gonna you know dip into a little teeny bit of it that's what appropriate for my purpose is they have a new performance core x86 thing that's supposed to be you know wider deeper smarter we can rebuild him the six million dollar who knows how much probably a lot more than that and you know every single piece of it you know the front end where it reads the instructions and figures out what it's supposed to do larger smarter wider the fifth integer execution port added you know okay so it's a 20% increase in number of integer execution ports and you know floating point has been improved memory has been improved all of this stuff and they try so hard in the in the architecture day you know who ha to make it seem like revolutionary you know new new incredible and you know what do you get you get a 19% improvement you know it's like is that great well I guess it's great especially you know if you're talking about it's time to buy your next computer and you want to know we would you rather 20% slower or would you rather have the end 20% better but it doesn't really feel like oh wow that's blowing the lid off the joint this is how we're gonna get you know dominance back 19% improvement they took the performance core and they put it in this bigger chip called the Alder Lake whatever and you know here it is those blue things there are the performance cores that we just saw it says up to 16 cores how do you get 16 cores out of that well it's because you got those eight dark blue ones and then you've got those four by four two groups of four little light blue ones those are the efficient cores which are also pretty powerful little computers but they're so efficient that you can put four of them in the space that you put one of the performance core so eight performance eight efficient and and there you go and then a bunch of stuff packed around the outside and that is the key to Intel's scaling philosophy and so it repeats over and over during their internet Intel architecture day presentations they take a handful so here they're taking eight of these little slices and then adding some more stuff on it that's shared amongst them you know in this case it's more cash memory that helps speed things up the problem is is that you might have wanted to use one of consider one of those little brown and gold things there to be one of the tile for our indefinitely scalability tile but we can't because now there's some unique stuff that's been put on it so the tiles gonna have to be something bigger that includes it but then at the next level it happens again now we have a bunch of slices being put together this is for their graphics card and then they add on some stuff that's shared ray tracing units and a bunch of other stuff what is that pixel back ends rasterizers and so forth that's shared amongst them all so once again instead of having a sort of flat tiling we have this hierarchical structure there's a department and then there's a department head and then there's an organization over that that collects a bunch of departments and we have a vice president and so on that is the intel approach to scaling and it's fine for what it is but for indefinite scalability it's a non-starter right because those unique things next level up there are those things again we saw them before now they added a whole bunch more cash that's shared amongst them and so on now you know Intel is you know great at the well they do tons of really revolutionary hardware you know at the level of the hardware now what they've got is so now when you're putting together multiple chips they're using other chips those little brown things as interconnectors on the bottom of chips so that they can pack these things really close together and that's that's you know really good because getting communication from one physical piece of silicon to another physical piece of silicon is really hard and that's one of the things where you know the t2 tile does really terribly at that inter tile communications they're working on it at the level of chips they put those things together into the sapphire rapids multiple tiles single CPU and that's the whole Intel philosophy of scaling in a nutshell you put a bunch of them together and you put a boss on top so at every stage there's always a boss on top and it just keeps going on up there's a bunch with some extra stuff on it once again the slices add something unique the slices they add something unique this looks the same as we saw before but this is their high-performance computing stack of stuff and now we have two of them but once again we stick in some two media engines that weren't present in the individual sets of slices and so on and so finally we get to Ponte Vecchio this thing is a monster a hundred billion transistors it's a whole bunch of different chips put together including some chips that Intel didn't even make and that is a you know revolutionary from the old Intel point of view compute tile Rambo tile base HBM emib tile that one at the bottom those of those little things that connect the other tiles together you put these together make a bunch of them and and the first demo that they're doing is this Aurora blade and Aurora is the name of the super computer that the Department of Energy has been trying to get built for I don't know at least a decade I'm not sure that it now is going to come into existence supposedly because of Ponte Vecchio being built into these boards with these you know cooling pipes all over the place to get them in so they can be crammed in by the thousands to get this thing to get to an exascale computer an exascale exa whatever is a billion billion so you're supposed to get a billion billion voting point operations every second out of this thing it's huge but they have to hurry because other folks are now on the bridge on the brink of bringing up their own in exascale machines designed in completely different ways so the world keeps marching on but that's where we end the story about Intel with the Aurora blade Ponte Vecchio and the Aurora system itself well okay so what is our indefinite scalability audit say for Intel using Ponte Vecchio for the Aurora system well once again step one is find the repeating tile the thing that isn't it hasn't got any weird stuff that you can just stomp them down but because of the Intel philosophy of adding something extra at each level the tile for Aurora is the entire supercomputer you have to plop down a whole mother supercomputer and so forth and then they wouldn't even be able to talk to each other unless you did additional engineering and that's the whole point of indefinite scalability so uh Intel score i'm giving it a seven uh that is not uh i'm not talking seven meaning a C that is a seven out of 300 uh like that and we'll explain a little bit more about how the uh the indefinite scalability audit score works in a minute but let's look let's look at uh tesla first uh so okay uh you know Elon Musk uh well okay Pat Gelsinger was the smooth engineer Elon Musk is the nerd engineer and you know i'm i'm a little bit of both as far as it goes um and you know Elon Musk is controversial and you know to some degree with good reason you know he's kind of well he's got sort of a serious case of cheese disease now cheese disease we discovered when we were in grad school and we would go to talks you know they were bringing in researchers to give us talks and we discovered that you know when it was a big name when it was a big cheese coming to give us a talk the talk often kind of stunk uh and the people that were more unknown that were earlier in their careers uh would give fantastic talks interesting detail and so forth and the the the big cheeses would just show up and you know talk from the top of their heads and have sloppy all over the place talk because they just sort of assumed everything that pearls of wisdom that would fall out of their mouths that preparation was for lesser beings uh Elon Musk has got a little bit of that and you know perhaps it's understandable there aren't that many people who have been the you know richest human alive ever um but you know it certainly can be aggravating and you know his budding into that whole cave situation with the the stranded kids that was just embarrassing uh but nonetheless from a nerd engineer point of view um i came across this other video which i wanted to recommend to you it's from a youtube channel called everyday astronaut and it's the the Tim Dodd he is the everyday astronaut uh got a two two and a half hour interview with uh Elon Musk wandering around the spacex facility nerding out hardcore about rockets now i don't really know much about rockets i learned a lot uh from listening to them go on you know but you sort of get bits and pieces from the outside and sort of marvel at the jargon like i suspect you know some folks who come to the t2 tile projects you know they sort of don't you know get all the details either but they can get the overall flow and the feel of it and that's what i got from uh everyday astronaut that that's the logo uh and and here he is you know he's going on and on there so that's that's Tim Dodd uh that's the actual everyday astronaut who does this stuff and you know he can totally kept up with Elon Musk they were going back and forth about you know Russian full flow hypergolic you know i don't even know what the jargon words are um and you know he even made some suggestions after uh Elon Musk had explained why they were doing some way you know Tim Dodd says oh i guess you can only do it this way because of blah blah blah and that made you know Elon Musk thinking maybe we actually could do it the other way and you know he took the idea it was it was really fun to watch uh so they were wandering around um uh you know looking at all this stuff i mean this huge stuff these you know three ton uh grid fins that were about to get installed on the side of the rocket and so forth and Tim Dodd was keeping up with everything like i said from Russian rockets to the Kerbal space program uh and this is the reason that i wanted to stop and take another minute here uh Elon Musk went through his rules of engineering which i liked quite a bit and you know other people like them too you can find them uh all around the internet now number one make the requirements less dumb number two delete a part or process step in his gloss on that is if you're not adding stuff back in at least 10 of the time then you're clearly not deleting enough stuff uh and then only step three which is where a lot of people begin is to simplify and optimize uh step four is to accelerate the cycle time whatever it takes to do it in step five is actually try to automate the process now of course from a computer nerd point of view typically will think you know we start with step five you know oh i could write a script to do that uh and so far far far away from saying you know do you really even need to do that task at all this stuff strikes me as largely just right and you know there's a bit of structure to it you know steps one and two are about problem setting and steps three through five are about problem solving and it's very common that people rush past the problem setting stage the problem formulation and they just take the obvious thing as read as given and want to run to the problem solving stage but you know the mature thing to do the smart thing to do is to spend some time on problem setting can i not do this at all can i do it a different way can i skip a bunch of steps and so on i do this right every once in a while i do it wrong an awful lot it's worth remembering um and uh elon musk's stated goal is that everybody should be the chief systems engineer and think how different that is from the you know the intel hierarchical philosophy at least as reflected in their architecture i'm not saying anything about their you know human resources or how their org chart actually works but just as expressed through the architecture everybody is chief systems engineer is much more flat and empowering rather than you know do your goal report to your third level manager and so on so i'm going to get my get me a uh i'm a everyday astronaut mug it says they're recommended for use on earth so that's where i am uh and you know so tim dodd thanks for what you do that was really i really enjoyed all those videos anyway so the um tesla ai day a lot of it was about self-driving cars and the vision stuff the processing stuff and i'm not going to talk about any of that uh they also announced the tesla bot uh with a uh really embarrassing person dressed in a white jumpsuit with a black head hip thing on uh dancing around to sort of suggest it the only thing that this reminded me so there's the tesla bot um and you know speed five miles an hour it seems like the key idea with this tesla bot is to make it a wuss uh that's the uh make it friendly and uh not not dangerous to humans but you know i'm i'm kind of dubious uh if it works it'll it'll be really amazing um but it sort of means that so you know what happened to boston dynamics how did they get into the wrong part of the search space uh so that they ended up producing these things because they made their things too strong the little wussy bot is going to be better i don't know we shall see uh it's it's going to be electrical rather than hydraulic well i'm not sure but uh i'm going to probably side with my friend rod brooks uh who responded to someone who was saying the tesla bot is going to revolutionize the world in 10 days we're going to think that this day was important because the tesla bot he says not i think that's probably true too we'll see you know i would like uh ilan must have succeeded you know he says he he wants to make humans multi-planetary and take consciousness light of consciousness to the stars those sound like great things to do but on the same at the same token i agree with bernie sanders that we should have progressive taxes for this you know because one of the things about making money in the capital or a kind of system is that it does get easier to make money once you have money so that's not linear so to have a linear tax scheme on a non-linear process doesn't exactly make sense uh uh we could make it all work out you know i want to come down on both sides of this the point is it's dojo dojo is a machine that they are building for specifically for training neural nets so it's not a general purpose machine so it has an easier job than intel does they don't have to sell it into people who are saying does it run windows because all it's going to be doing is is pumping data sets through as fast as possible to do stochastic gradient descent on machine learning networks all different ways so they got to have a blank slate which intel didn't really have and that accounts for part of the difference but i think it goes deeper than that so here it is it's a distributed compute architecture compute elements plus network fabric look at that picture look at this picture we'll put them side by side the one at the top that's the thing i made in 2008 as part of the media stuff to go with the t1 tile we didn't call it that uh it's the same idea yay um and you know so okay so this is the individual training node the individual cpu core in effect of the dojo architecture it's got you know pretty pretty good specs uh it certainly puts the uh the t2 tile uh the cpu's to shame and so forth both in terms of of cpu power and in terms of connectivity and then they start stacking them together they put them they're designed so that they butt edge to edge um and then you have 354 of them and you make a ring of i o around it and you get the d1 chip and that's their unit of fabrication it's got 11 miles of wires each one and this is big and they're just beginning now you start and there it is so they've actually got some of those working um and then this the plan is right so we're not actually there we're moving into vaporware now is to stack more and more and more of them until you get 500 thousand of these things each of which is a major deal um with 1500 of these d1 chips each of which has 354 of the individual cores on them uh and you see it stretching out in you know to the left and right why is that it's because they put uh conventional processors uh communications and host systems around the north and south and let it scale in east and west and that's critical for indefinite scalability right that means that uh we have just given up on scaling in the vertical direction it looks like we could still scale in the horizontal direction um and this looks to me to be a system choice i mean i would imagine they could have made that then keep going if they wanted to but they needed to communicate to get data in and out from the outside world so this is the choice they made and you know here's the stack of the tiles and you know power goes in the bottom heat goes out in the top the compute plant computing communications moves back and forth along the middle the power supply takes 18 000 amps for one of these things and they have those actually exist too um and the idea is to put it together into a 10 cabinet group that's going to hold those 500 000 nodes in the 1500 whatever it is d1 chips put together into the training tiles and you know 1.1 eflop that's going to be another exascale machine now it's custom for uh doing their machine learning stuff but at the root those are general purpose processors they got a megabyte of ram you know you could run the mfm on them you know with a little customization a lot of customization but you absolutely could and then you'd get that incredible communication speed so the indefinite scalability for the tesla dojo exapod system i'm giving it a 91 out of 300 how do i get to 91 well if it was indefinitely scalable in one dimension the score would be a hundred if it was indefinitely scalable in two dimensions which they didn't even try the score would be 200 and so on there's not much more so on to do there i'm taking a couple of points off because we don't actually know uh how they're doing the addressing along that we don't know if we can really extend out arbitrarily in that direction and we don't know if they're actually giving each core a unique address for routing purposes which i kind of suspect they may be doing anyway it's very close to indefinitely scalable on one dimension it could probably depend it could probably be made indefinitely scalable on one dimension it might even be possible to make it indefinitely scalable on two dimensions that's pretty exciting so there's the bottom line in the very first indefinite scalability audit intel versus tesla head-to-head friday august 19th 2021 tesla wins in a romp 91 to 7 exactly where does the t2 tile come in on the indefinite scalability audit well it's going to be over a hundred i don't know how close it is to 200 we'd have to figure it out sometime so that was what i wanted to cover uh on the insiders report it's mostly bad news i was supposed to have ulam code see a tile failed event meant didn't try it supposed to have the ulam 5 building on the canonical form didn't try it so those are the goals i'm just reposting them from next time but i did do some stuff other than just look at videos and make slides um there's been a fundamental problem about how do you build up to higher scale in ulam programming and you know in particular an adam in ulam is only 71 bits in size uh so how are you going to get bigger an object bigger than that well we know about that is you make molecules and you we build a plate that has all this kind of stuff and so on the problem is all of that is invisible to the programming layer if you're when you're programming atoms if you want to deal with something that's bigger than 71 bits you have to go over and look at the event window next spot there next spot there and so forth couldn't we take a scheme some semi-generalized scheme to take pieces of multiple atoms and put them together into some virtual object and in a way when you think of actual chemistry real chemistry where molecular reactions occur they go through they describe in terms of this activated complex you know where all the reactants have come together but the actual reaction hasn't happened and then the transition state where the reaction happens and then you know the two the hydrogens and the oxygen are gone and now we have just one molecule of water uh we've got the same kind of thing uh that i've now implemented here i'm not going to do a demo i just made some screenshots instead because i ran out of time but we'll talk about it next time we now have persistent transients that sounds like a contradiction in terms the whole point of a transient is it's a object a structured kind of object in ulam that's not limited to 71 bits in fact it can be uh a kilobit no actually it can be eight kilobits long it can be really big and the reason it can be really big is that it just lives on the execution stack it's transient it gets refreshed it starts new every event and then at the end of the event it gets tossed away but now we have a persistent a transient called persistent how do we do it we do it using this bit storage a new idea that uh well actually here yeah so the key for persistent is it's got two uh virtual methods gather and scatter gather is meant to look out in the event window and take whatever bits from here and a little bit from here bits from here put them all together into the transient you then go ahead and do your event on the transient where all the bits are now sitting in data members very handy and then afterwards you scatter them back to wherever they came from or they could even go someplace different if we had a more advanced gather scatter system which you know is totally possible with this thing it's built on top of bit storage i'm not going to go through the details of it now it'll be better next time we'll have a demo but i'm actually pretty excited about this and i did a demo so i've been working on the level two plate this was all actually a sub goal this was some yak shaving off of the l2 plates the problem with the l2 plates was to try to get that l2 plate sequencer in there to help control it it meant i had to dedicate a whole extra row and a whole extra column of in addition to the l1 plate that i was trying to surround and it caused tremendous trouble in the lots of asymmetries um and i was like well why couldn't i take several adjacent border sites strap them together and put the sequencer in there and that's what led to the transient persistent l2 plate sequencer it's a transient and yet it persists and so all right so here's a example where we've got a bunch of little my elements that have now been surrounded by l2 plate nice and tight just one level thick and uh there's no way we're going to be able to see this are we no we're not um one of the tricky businesses about these persistent transients is you can't click on them with the atom viewer because they're not in atoms so here these bits are actually changing in three different atoms every time an event happens on any one of them so there's a certain amount of parallelism here is that right now you know you get an event on either of these three guys that's enough to do the gather the event and the scatter for all three of them so you get better event efficiency out of the scheme as well all right oh i'm over a half an hour um uh i will stop um i hope the first indefinite scalability audit was was interesting i really do feel like uh the world is coming around to spatialized computing and uh tile tile based computing now again tiles in general go back decades the transputer there's been a million of them uh well thousands of them maybe um but taking it seriously and doing the engineering work for it that's very encouraging thanks folks for dropping in whether you're here live or looking at it later i really appreciate it hope to see you next time