 The T2 tile project is building an indefinitely scalable computational stack. Follow our progress here on T Tuesday updates. All the way back at the beginning of digital computing, John Von Neumann, who became the father of digital computing with the Von Neumann machine, talked about how different machines, manufactured design machines, were compared to living systems as far as how they handled error. That living systems, when they got injured or damaged or something went wrong, they would try to ride it out, keep going, hide as much as possible, heal up later if possible and so forth. Whereas we designed machines, so the first thing that goes wrong, we make it fail as obviously as possible. Never hide mistakes, never hide failures, but instead stop immediately. Why? Because it's hard enough figuring out what's gone wrong with a machine when one thing has gone wrong. And if we keep on trying to go further and now two or three more things go wrong, the chance of us figuring out what was actually the problem become very unlikely. And as a result, the whole idea of serial determinism is motivated. That everything has to be exactly right and as soon as anything goes wrong, you want to quit. Well, if we're talking about best effort computing, if we're talking about giving up on serial determinism, that means we're going to have to face a whole new level of bugs where things may be, you know, going somewhat wrong and in the process of trying to recover from previous problems and continuing to make progress and so on. And that's what the last couple of weeks had been about for me. The problem I specifically came up with was, you know, I have all these log files, I've described some, I've showed some of them here, that give a sequence of events that go on. Well, let's just take a look at it. I'll come back to talking about the sync in a minute. All right, so here's what my setup tends to start looking like, right? I've got two machine, two tiles that are connected together here. I got another one that's ready to be connected. It's all powered up and so forth. And, you know, these guys, and I have ethernet cables sticking out of all of them. That ethernet cable has to come out of the West. So there's only so many kinds of grid arrangements you can use if you want to get an ethernet cable into every tile. Eventually, that's not going to be possible. But right now I have three and I end up with, you know, having three terminal windows where I'm SSH'd into three tiles simultaneously trying to figure out what's going on. But, you know, these log files, you know, they're really quite simple. They just have sequence numbers, like save it at 225, 226, 227. We know what those mean. Let's say what happened, but that's a single log file on a single machine. And what are we going to do when in fact we've got something that's necessarily involving multiple tiles at once like intertile events? And so I started building up a new kind of trace file just like happened with the Linux kernel module. Build tracing systems so that we can watch what's going on with relatively little impact on the timing. Not just what steps happen, but how long it takes by focusing on making the logging reasonably cheap. And that's what I have here. And now we start capturing the actual time that this happened. Six minutes, 57 seconds and 444 milliseconds from the start of this file. This particular thing happened. And so forth. But now that we've got a separate trace file on multiple tiles, the multiple tiles don't necessarily have a common clock. That was the sync issue that I was talking about up here. Sync what you want, but use what you sync. The whole history of, you know, digital computing and much of the history of technology has been about building larger zones of control. And one of the examples of it is mapping the earth and dividing latitude and longitude, getting time nailed down. All of these things, which were giant scientific and engineering challenges and breakthroughs in centuries past, all they were all doing was building larger and larger zones of synchronization. And that's going to happen in the movable feast machine as well. But the important point is the architecture, is not supposed to pretend that it can solve all synchronization problems for you. The idea is if you have some local computation that needs a certain amount of synchronization to get its job done, then you build synchronization at that level. But you don't gratuitously synchronize everything upfront just to sort of clear the decks because that won't be indefinitely scalable. Sync what you want, but use what you sync. It's from an old Simpson's Treehouse of Horrors episode. This is what's going on here. So now we have multiple tiles that have multiple trace files that are not, we can't assume we have network time protocol running on these things, which we don't because in the middle of this glob of tiles, there's no network. That's a feature that there's no network there. So what I did instead is I said, okay, well, all we really need to know, we don't need to know absolute time. We just need to know the time of relative events. If this guy sent a packet and this guy received it, we have to align them up so that those points in time match and then the ones in between we can count on the quartz crystals that each tile has remaining relatively close. So I developed a alignment mechanism and the idea was, excuse me, can I have my thing back here? What's going on? There we go, chase. What we'll do is we'll take the packets that they send back and forth. We'll take certain ones that are reasonably rare and we'll just put a big random number at the end, you know, 31 bits of randomness at the end of the packet and we'll see that that bits of randomness gets recorded in the trace file of the thing that sent it and the trace file of the tile that received it. And then we can use those points of random numbers. Oh, here's FEA07, you know, whatever it is that was sent by this tile. And here's that same number, FEA07, that was received by this other tile that I can now say these things must have been about aligned in time, modulo, a little bit of delay for the packet to get there. And furthermore, that whatever connector I sent this one out must be connected to whatever connector this one arrived on. So by collecting a bunch of these synchronization points which are just random numbers that got sent between tiles, we can then do statistics on it and try to figure out the offset between what this thing thought the clock was set to, this thing thought the clock was set to, and line them back up. And so there we have this ITC sync process. This is again, was in the early days before it actually existed, but you get the idea. And so here's an example of the early days it was setting up. At 40 seconds and 22 milliseconds, a packet was sent out that was an open packet and it had OFF1504, it looks like off, but it's just a hex number. And two milliseconds later, it was received by, it was sent by East, it was received by West. There it is, that same sync number. What is this? This is a loopback cable. East is connected to West on the same tile, pack it out, pack it in, sync matches. You get the idea. So we go through, we find all these things, we build them in a map, we do statistics on the time delays that each sync point implies for the beginning of its particular file. And then we average them out, try to get rid of outliers because it is always possible that we might pick the same random 31 bits more than once. I'm not sure I've ever seen that yet, but certainly it's possible in principle. And so here's another example. And this one, look at this. This is number zero, number one. Those are two different tiles. This indentation from tile one is a completely separate trace file that had its own time base that we found sync points in it and we tried to match them up. B32, A20, B32, 28A, whatever it is. And that allows us to then see, okay, went out one place, came in another place and so on. Here's another one. Here's an example of an early version of the trace for an entire event. This is on the loopback cable, but it's an intertile event. The open stage is when the two intertile connectors have recognized that they're compatible and they've exchanged caches and everything's all good. And so here's the sync point D29. The other guy has a sync point to F327 and so forth. And one guy says, okay, I'm calling you up. I'm ringing your number one on your circuit because I would like to do an event at my coordinate 021 relative to East for an event window going out distance four and it's even for the yonk bit. That went out East two milliseconds later. It was received by West. West processed it and answered the call. Answered the ring saying, yes, you can have it. The answer receives back at East. Talk, talk, talk, talk. That's sending the cache updates corresponding to a Western bean in this case actually moving one site. So the site where it used to be became empty. The site used to be empty became it. That ends up taking 20. Well, those packets eventually got merged together. These two talk talks got turned into one longer one. 29 bytes total to have a single thing moving to a different space. Hang up, hang up. That's the end of an event. And it took something like 200, 200 milliseconds. That's a lot. Some of them go quite a bit faster than that. Now, there was a problem. So I ended up with three files, three tiles that were all connected together. But one of them was actually a reboot of the first tile. So tile 0 and tile 1 overlapped in a trace file. And tile 1 and tile 2 overlapped in a trace file. But tile 0 and tile 1, I'm sorry, tile 2 and tile 0 never actually overlapped because they were the same tile that got rebooted or the engine restarted. And when I actually merged those files, it figured it out. It figured out an alignment. But it actually got it slightly wrong. So there's this scary thing that so northeast less than means it received. Northeast received a shut packet with a 741-64 sync tag on it. And according to the alignment, that sync tag wasn't sent by Southwest until 10th of a second later or something like that, 30 milliseconds later, or something like that. So it screwed up the alignment. And I'm not 100% sure, like I said, exactly what's going on. But it is still going to be the case that it doesn't necessarily account for the intertile of the various lags that go between the things. It tries to average them out going both ways. But it might not come out quite right. So in the end, this is another one where the ring goes out at 307 milliseconds and it was received 30 milliseconds before. Nice work if you can get it and so forth, several of them. So finally, I found a bug, but finally I actually put in a tweak map so that if you discover these things, these Acausal time travel packets going on, you can tweak it at the command line as well just to line it up. Not the most comforting rock solid reliability, but in fact, it seems like it's been no problem in the cases I've looked at. Sync what you need, but use what you sync. So that's the story on that. But the longer story is bug city. Bug, bug, here's now that it's been fleshed out. The trace file now has a lot more information in it. This is one intertile event. Again, one on the loopback. But in fact, this one goes from 629 milliseconds, 663, so it's only about 30 milliseconds long, getting a little better and so forth. Here is an event on event window 12 using circuit phone number 11 on the ring thing that actually it gets interrupted. So it sends out the talk packet. There's the 29 byte talk packet saying, here's my cache update. And while it's waiting for that to be received and to be handled and processed, it actually does two more events. The 1617 guy goes to 1527 and so forth. And then the packet arrives at the other side and it keeps on going. This is the multiple event windows in flight in action. So that's pretty cool. All right, but having so many failures that I just said, you know, I needed a kill screen. I needed something to show that it's without having to look at SSH. And boy, it's been blowing up every single possible chance. So let me see. I'm taking up too much time. But let's see if I can show it to you here. All right, that's going to take. So we're just booting them up brand new. While we're waiting for that, you know, look at this. The home page for the T2 tile project. This is a frame grab from another video by the Contraction Collection YouTube channel that's run by a friend in the area who actually included a frame grab because he was looking for inspiration from out on YouTube channels and he had a quick montage flashing just a few frames and he stuck in the T2 tile projects. That was very nice. I'll put a link down there. He's building ballast scissors. You know, I don't know. There's these flippy knives, you know, that you can whip around and it does a thing and it's got handles that turn into the blade cover and so forth. He's building one out of scissors that will do it. It's very cool. And he just his channel just blew up. He's got 3,000 subscribers now. Very cool. All right. So we've got, here we go. So these guys are connected. We've got a loopback cable between southeast on this tile and northwest on this tile. Let's stick in a beam. This is a new kind of beam that goes until it can't seem to turn. It can't go make any progress. And then, oh, it did that. You see, it went out through the loopback and came back in. Oh, here it went over to the other side. So this one keeps going straight until it can't make progress forward. Then it picks a random direction. Look at that thing go. Oh, yeah, there it is. There's the teleporter. That's exactly what I was going to show you. There's all kinds of bugs. There's coordinate problems, transform coordinate problems, the resources in the event windows that are not getting cleaned up properly and so forth. There's a ton still to do. I was, you know, always hoping to have more progress to show you, but I decided, hey, I'm going to show you the bugs. So that's where we're at. That's our bug demo. Next episode is going to be close to the end of June and the next July is the A-Life 2020 conference where I'm still hoping to have not just intertile events working, but the grid, you know, like over 100 tiles all going at once. There's a tremendous amount to do. I'm very scared about that. We'll see how it works out. I hope you're doing all right. I'm not exactly sure what I want to do to help out with all the stuff that's going on. It feels like a new kind of sink forming bottom up and it can crystallize very rapidly. We'll see what happens. Thanks for being here. Hope to see you next time.