 What did you have for lunch? The burrito. Yeah, the burrito as well. What did you have in the burrito? Oh, the beef and some guacamole. It was really nice. Stop. Enough of this cutesy chit chat. It's time to learn tech time. You're not the same person that was here last episode. No. No. You are Baramus. Yes. Is that correct, Prince? Pronunciation? Pronunciation? Whenever people ask me how do I pronounce your name, it's mostly through chat and it rhymes with Nostradamus. But in the UK, it's Nostradamus. In US, it's Nostradamus. Oh. And I say Bramus. So it's Bramus, Bramus, or Bramus. Oh, so you don't have the same number of syllables as Nostradamus. No, no, no. I was trying to find... It just rhymes with it. Brannoster... Bramus? What? Oh, okay. It rhymes with Nostradamus. Okay. Cool. And you work on the same team as me? Technically, you mentioned you weren't part of our team, but you are. Subteams and... You're crumbed out for all. So if anyone, like, you know, because the last few episodes have been getting guests from outside Google it. Yes. So if anyone was wondering, like, how long I'm going to make that last, you know, until I run out of friends, I've had to fall back to just a colleague. Yeah. Just a colleague. But maybe by the end of this, we might be friends. Yeah. Well done. You'll, at some point, find out what it's like to be friends with me and you can revisit that. Yeah, maybe I will reconsider. Yeah, exactly. But I want to talk about some magic tricks with the HTML parser. Sounds special already. Magic tricks. Oh, it's special. It's special. So the browser receives bytes of HTML down the wire and from this it constructs a DOM. I wrote this on the show before that it can do this in a streaming manner. It doesn't need the whole document before it can show something to the user, which is like super handy when it comes to, like, long documents and stuff. So question time. Yeah. I'm going to change the HTML a bit. What changes about the DOM? So we've got the HTML on the left and the resulting DOM on the right. We've got some showing where the white space and new lines are as well. Doesn't it do this trick where it auto closes some tags, but not all, I guess. So a paragraph might be a good one to auto close, I think, because you can't mess them. So what you're saying in terms of the DOM change? It stays the same. That would be my guess. That is a very good guess. And to all intents and purposes is correct. It'll do this. Because that white space is now inside the paragraph. But you're right. Yeah. The HTML spec defines certain tags which just don't have to close. And they'll auto close and paragraphs are one of them. But what happens now? Now it gets messed up, right? Does the M continue or not? I would guess it auto closes it because of the succeeding P there. It would say, I'm going to close this M. That would make sense. Now what happens? We get an M in both of our paragraphs. The spec defines certain formatting tags which we'll carry over, which is great. Because if you had your HTML like this, you end up with double M. And if you just keep on repeating lines like that, you end up with increasingly nested M's as we go along. This is great stuff. So here's a partial HTML document. This is partially paused. You get a DOM like this as you would expect. What about now? I think I know this. I think it switches them because it sees that this ain't correct. So it will switch the P and the M tag to make it correct. Well, nope. It's called the paragraph inside the M. But all of that now. Well, I'm guessing that now it will switch. Yeah. It's pretty much, here we go. Yeah. So it pops the paragraph outside of the M, but using a very similar model we saw before, it's going to recreate a new M inside the paragraph, which closes it. Oh, yeah. So when it sees the P, it will close the M, but it will continue the M down. So it's more when it sees the closing M tag is when it triggers it to do all of this. It's called the agency adoption algorithm in the spec. Yeah. Anyway, usually when I show these kind of examples, someone will say, and I think we've had this in the comments in previous episodes, HTML should just have a strict mode where it fails hard on errors like it does in CSS and JavaScript. And I want to kind of provide the other side of the argument because there's a lot to it. We've tried this, right? XHTML2? Brilliant. Exactly. So let's talk a bit about HTML. So we've got Hypertext Markup Language is what it stands for. Originally it was based on SGML, Standard Generalised Markup Language. It looks very similar. It had very slightly different rules, but whatever. This whole thing actually came from IBM. Well, it was a separate spec called Generalised Markup Language, which looks quite different, but it's where SGML came from. And that was invented in the 60s at IBM. Wow. By these fellows, Charles Goldfrab, Edward Mosher, I'm not sure on the pronunciation, and Raymond Laurie. Now, oh no, this is where the name came from. So like the bedrock of our industry, like the markup in HTML, it comes from a background name from a group of lad's names in the 60s. Cool. At IBM. So how do you feel about that now? But yeah, what you said before, exactly right. We have tried this strict thing before. This was in the mid to late 90s into the 2000s. There was an idea of like, let's break away from SGML completely. Let's do this all in XML. And this was XHTML. And you mentioned XHTML too. So this is early 2000s. And this is when they said, right, you need to be parsing this as XML now. That's the rule. XML doesn't have any error correction. It's like when you get to a mistake, poof. Yeah. The spec does say like, we're not going to tell browsers what to display in those cases. But, you know, there was no real contingency for displaying anything meaningful. And when browsers like this spec went along for a bit and browsers ended up saying, no, actually, we don't want to do that. And I think that was the right thing because take two browsers, which one is better, the browser that will display the opening times for your local doctor's surgery or the browser that displays this. And this is a real example, by the way. I took the HTML from my local doctor's surgery, the page which displays the opening times, and I passed it through a strict parser. And this is what you get. But a normal HTML parser will deliver me the opening times for the doctor's surgery. Yeah. And I think this is worse. It definitely is. It's worse for the user, right? Like, you could say, well, this is the developer's fault, but if a user can choose a different browser where they get the opening times for their doctor's surgery, then that's the browser they're going to choose. Yep. And it seems better that they would do that. And the other part of this statement, and I've heard it multiple times, like, I want it to, you know, fail strictly like CSS and JavaScript. That's not right. CSS fails wonderfully well. Exactly. Here's an example. If you have some CSS like this, this isn't valid CSS at all, and then you carry on. The div will have a background of yellow. The HTML will not have a background of green because the way it parses it, it kind of, that bit invalid bit kind of gets rolled into the next bit. But as you say, the spec says what to do in this situation and how to eventually recover from those error cases. At first, I thought, like, okay, but that is true of JavaScript. But the more I thought about it, in this case, that first block is not going to run because it's got a syntax error, and it does fail immediately. But that second block will run. Yeah. But this is because they're separate blocks, right? So separate contexts and... Exactly. Exactly. But it's still not as strict as it could be if we are taking a super strict model. And even in JavaScript, if you have an event listener which throws an error, you will still fire the event again, you know, or fire... I mean, even run other listeners, even only one listener, like, through an error. In the browser, JavaScript is quite error resilient. It's not as resilient as, like, CSS because imperative language, it would be difficult to do anything even close to useful. Yeah. I think the difference is that if CSS sees an error, it will continue down. If JavaScript sees an error, it will stop right there. It stops that bit of execution. Yeah. Like the callback for the event handler, that function will bail out, but the rest will continue. Yeah. It will bail until the stack is empty, which... But if you run JavaScript on the command line, it, you know, encounters an uncalled error. It just bales on the process. So our super strict model could do the same in the browser. It could just crash the tab, you know? So you could say that even JavaScript is pretty, you know, it's not super strict mode. It's quite resilient. It could have been stricter. It could have been stricter. The bad news was, back in the HTML4 days, there was no spec for what to do with this. It was just left to the browsers to figure it out. And what happens when you ask browsers to figure it out? They all do. The same thing, right? No, no, the other thing. They do something completely different. Internet Explorer, at the time, it didn't even store the DOM as a tree. It stored it as a kind of graph because that's what they did in Word at the time. And they thought, well, this is like a formatting model will do the same thing. So in Internet Explorer terms, if you were going to say like, world, is that inside a paragraph or is it just inside the body? Internet Explorer would say, yes. Yes, it is one of those things, depending on how you read the graph. And that's why we ended up with so many bugs in Internet Explorer, because things like the DOM APIs, things like CSS, these rely on this tree view that Internet Explorer just didn't have. It was kind of mapping that backwards from an incompatible format. Opera at the time would store this as a tree, but the CSS would be interpreting it as a different tree. Mozilla, Firefox at the time, how it would parse this depended on how many chunks were sent to the parser. So if you sent that as one chunk to the parser, it would come with one answer. If you sent it character by character, it would give you a different answer. So it would evaluate each chunk individually. So in the real world, that depends on TCP packet boundaries. You might get a different tree from this in Firefox at the time. And that left what Safari did, WebKit did at the time, which was this. And so folks decided we need to write a specification for this, and they went through all of these weird cases. Well, I say it was basically one guy, Ian Hickson, went through all these cases and picked something for all the browsers to do. And that usually meant doing what Internet Explorer did, because pages were built for Internet Explorer back then. There was only cases like this where Internet Explorer's behavior wasn't even consistent with itself, that they went looking for another behavior. And that was HTML5. So this was 2006 when the parsing part of the spec was released, and this is what browsers got behind, and it gave us what we have now, which is a consistent model between all of the browsers for how to handle these errors. But if anyone watching isn't convinced by, like, you know, this relaxed model of HTML, good news. You can put your money where your mouth is and do this. If you serve your content with application, XHML, et cetera, et cetera, and you put the XML namespace on your HTML element, that will kick all of the browsers into a strict parsing mode. It will parse it as XML. And off you go. And that means if you get anything wrong, this is what you get. And this is how I created this example. I was just taking the HTML from Dr. Surgery and putting it into this mode where the browser will fail hard like this, which it does. So yeah, go do that if you must, but it's really users that will suffer. I do wish that our DevTools reported parsing errors, really serious parsing errors, a bit more than they do now. But I filed an issue about that. We'll see what happens. All right. But we're still kind of left with some of the remnants of XHML. Closing tag. Self-closing. Do you do this? I still do it, but it's not needed if I understand correctly. Yeah. I'm into minds about whether I like this or not. And I think I'm starting to come to the conclusion that I don't like it. I do it in a lot of my projects because I use prettier, and prettier will add these in. And I'm like, whatever, I don't want to have the argument. I'll just go along with it. Do you do the space before? Yeah. Do you know why? Also with the break, like the BR tag. It's a habit. I do the space. Do you know why you're doing that? It gives you compatibility with Netscape Navigator for congratulations. So people doing this, this is giving you compatibility with XHML parsers and Netscape Navigator for. So I think, I don't know, it seems a bit silly to me that we still do this unless you have those requirements. Because the browser sees this in yielded days of HTML4, the slash would be a parsing error, but it would recover. Now in the HTML5 and the new HTML spec, it sees that slash, but it just ignores it. It doesn't do anything with it. So this works. But if you do this, again, the slash is ignored. So your span is now inside your anchor tag. Whoa, okay. Because that slash doesn't mean anything. So I find those trailing slashes a bit misleading. Because unfortunately developers do have to just remember which tags are self-closing. And that slash doesn't really do anything. Except in foreign content. So we're talking SVG and MathML. Now all of a sudden that trailing slash is meaningful because it's, yeah, a different parsing rule kicks in. And that sort of thing works. So yeah, unfortunately a lot for developers to think of, but there's no other way around it. All right, back to quiz time. You ready? Okay. So we've got a partially parsed document here. You can see how it looks. What about now? So this is a synchronously running script that will run as part of parsing immediately as the script closes. So it's getting the h1 and it's appending the thing with class summary, which is there. It's closed. It's not closed. That's the interesting bit. I know. It's difficult, isn't it? I can see two options. One, it will do it. Or two, it will say like, hey, you know what? This div class summary. Internally, I close it just to make sense, but you can't do anything with it just yet. So maybe the second behavior that will be. Okay. So what it does is it will do what you guessed. At that point, it does have a node, an element in the document with class summary. Yes, it's not closed, but there isn't really the concept in the DOM of an unclosed element, not in terms of what JavaScript can see anyway. So our summary element is now in our h1 and our main element is now, well, it's got some white space in. But, was this going to go? I always like a challenge. I honestly don't know. So what this will do? It will append it to the summary element. Okay. So it remembers its location and it will continue there. Exactly. So, what about now? So we've now closed the summary element and we've added another paragraph. What's going to happen? No clue. No clue. So this is going to go in the main element. So it's down there. Here's why this happens. So the parser has a stack of open elements. So right now it's HTML, body, main, and this div class summary thing. So when this runs, right, fine, it moves the summary element. Cool. And when it sees this paragraph, it's going to insert it into the top item on the stack, which is our summary. So there it goes. The summary tag closes. Gets pops off the stack. Pops off the stack. So the next paragraph goes into the main. So when you're moving stuff, if you end up moving stuff around a DOM while it's parsing, that's the model. If your JavaScript moves stuff around, it doesn't affect the stack of things in the parser. In fact, even if you remove the element from the DOM, the parser will still inject stuff into that removed element until it gets to the previous item in the stack, in which case things will start appearing in the document again. But we can actually use that for something useful, really. So switch things up a little bit. So on the right, we've got a complete DOM, but it's got a script. And on the left, I'm going to show you what that script does. So GitHub. When you navigate around GitHub, you click a link, it does its whole SPA thing, but it's quite a simple SPA model. What it does is it just goes and fetches some HTML, and then it dumps that into the document, using a model very similar to this. So you can imagine, create a div, append a div, do some inner HTML with it. And that works as you'd expect. But the problem with GitHub doing this is they lose that whole streaming benefit. They have to wait until they fetch the whole thing before they jump on the page. But we know with things like fetch, you can get the content like a bit at a time as it comes down the network. You can receive it in chunks. I've seen people try to take advantage of this by doing something like this, like they append the div and then do inner HTML hello. And everything's going pretty good so far. But then when they get the next chunk, they do this. Now, the problem with this is by appending to inner HTML, you're actually asking it to serialize the div's content, turn it into a string, which includes the closing tag in that paragraph, because it doesn't maintain internally that it's still open when it serializes it. It will give you a proper serialization. So when you append world to the end, you end up with this. It gets the closing tag, appends world, and then it sees that closing tag, and it's like, I don't know where this closing tag's from. And according to the rules, it will create an empty paragraph when it does that. But there is a way to do it properly. Have you ever seen this API before? I had it until relatively recently. Well, it rings a bell, but... Yeah, enlighten me. Yeah, so this is... You're creating a new HTML document, but just the one in JavaScript land, it's not displayed to the user. But what you can do when you've got this is you can call document.write, and a lot of people say, don't use document.write, it's bad. Yes, it is bad when you're calling it during the parsing of your main document. Calling it on this detached document, totally fine. And this is what gives us this access to the parser. And because we're document.writing, we're injecting strings directly into the parser. It has full parser states, so it's like a partial parsing right now. So I can take that div and pop it in the document. And now, when I write to that document... It's going to go inside the div. It's going to go inside the div. Yeah. Which is great. So now, as you fetch the content, and now if I pen to it, because it's got full parser state, it does understand that that paragraph is still open, it will do the right thing. Nice. So you can fetch something iteratively and pipe it into the document. You get that, you can have that SPA thing, but you can be streaming HTML directly into your page using this little hack. That's cool. Isn't that cool? I love this hack. And that's really all I wanted to get to, but we just had to understand loads of stuff about the parser before we got here. If anyone wants to learn more about this sort of stuff, some of the examples that I've come from this book by Simon Peters, who does a lot of spec work, understands the parser better than most people in the world. And there's an online book. It's great. It's still in progress, but yeah, there's some even weirder examples in there and some like historical data of how we got to where we are now with the HTML parser and how it interacts with script and all that sort of stuff. So yeah, go take a look at that. I'll put all the examples in the description, as I normally do. But yeah, you can get access to these streaming HTML parser through JavaScript and speed up your pages. Cool. This was the awkward silence. This was the awkward silence. Yeah. I hope I don't cut it too short. The awkward silence. Woo!