 Welcome everyone to the August Developer Hours. Just a reminder that this session is being recorded and as I just said, it's going to be uploaded to YouTube and to WordPress TV. So you'll be able to catch up with it there. Please do turn on for your video. It would be great to see you all. But if you don't mind remaining muted during the presentation, when it comes to the question and answer session later on, you can unmute to ask your question. But please remain muted for now and during the presentation. So today's topic is all about the HTML API, part of it which is already available and the rest is coming soon. And it's my pleasure to welcome Dennis Snell, who's one of the key developers of the HTML API, as the guest for this session. And he'll be doing a presentation on the HTML API. Dennis, would you just like to introduce yourself? Tell us a little bit about yourself, what your background is, how you came to be in WordPress. Thanks, Michael. Well, I have been working at automatic for nine years, nine and a half years now as a software developer. And I have been working with WordPress for much longer than that. I got started like many people. Kind of I was doing web design on the side and building sites for people and I built a couple of really terrible content management systems for my own use. And one day I stumbled upon WordPress and I said, I'm never doing all this work again. So I eventually got a job as a software developer at automatic and I've been working with WordPress since. Love the platform. Love to see the community that it's grown around the web and love to see the ways that it kind of provides all the tools we need to handle everything to build everything from a small website to a big one. And this HTML API is just the latest part of that that kind of took my passion. I've been thinking about it for about seven years now and decided, you know, we need to have WordPress provide the tooling to read and modify HTML reliably. So it's fantastic. We look forward to your presentation. Do you want to do your presentation and have questions at the end? Or are you happy for people to ask questions during the presentation? Your choice. Well, I would love it if people feel welcome to interrupt with the question. I think there's a raise the hand button and there's chat. If somebody can monitor the chat, because I think sometimes the questions can help lead us, but I may hold some questions to the end if we're going to get to it anyway. Okay. Yeah, I was going to say that anyway about the questions. Either put them in the chat or raise your hand and we'll get your questions answered by Dennis. Okay. Before I hand over to Dennis, let's just do a couple of announcements. As I'm sure you're all aware, WordPress 6.3 was released a bit earlier this month. That means WordPress 6.4 is now under development. It's slated for release on the 7th of November. And WordPress is, of course, open source. Anyone can contribute. So if you want to get involved with the development of 6.4, I've just put a link in the chat for you to follow. WordCamp Asia is the next major flagship WordCamp. WordCamp US has just happened. Were you at WordCamp US, Dennis? Not this year. I actually had a pre-existing commitment. Right. Yes, I wasn't either. But WordCamp Asia is the next one. Oh, I haven't put the dates in my notes. I think it's in March next year. Anyway, I'll put the link there. The call for speakers isn't currently open and is open until the 30th of September if you want to speak at WordCamp Asia, which is in Taipei in Taiwan next year. As I said, I omitted to put the dates in my notes, but if you follow the link to their website, you can have their dates there. I'm pretty sure it's sometime in March next year. Dennis has just posted March 7th to 9th. Thank you, Dennis. And finally, on the announcements, as I say at every developer hour session, now that we're effectively post-pandemic, more and more local WordCamps are being organized. And if you go to central.wordcamp.org, the link is in the chat, you can find a WordCamp that's happening near you. Okay. The floor is yours, Dennis. Thank you. And thank you all for letting me come in and speak with you today. Will this chat be saved? We've got a question. Yeah, I'm just responding. Yes, you can save it at the end. I'll give you instructions towards the end of the session. And tell you how you can save the chat locally. Sorry for that, Dennis. Carry on. Oh, good. Oh, good. So I don't know if anybody's willing to share a hand or give any mention of it, but I hope some of you have worked with HTML before in WordPress. And what I mean by that is maybe you're looking for some tags that you want to add or replace the CSS class name or you want to, you know, you've got this button and you want to swap it out with something else that serves the same purpose or you identified a string that you wish you could just target on the server. And just cut from the rendered page. This kind of operation is pretty straightforward in JavaScript. But in PHP, it's a little bit more challenging. Anybody here ever written a regular expression to match some HTML? Got a couple. How to go? Yeah, I've tried it and struggled with it. Regular expressions are hard. About seven years ago, WordPress added support for an image source set attribute. So the idea is that when it renders an image, it adds an additional list of optional images to download that are different sizes. And the browser can choose whichever size is appropriate for the screen it's rendering on. And this is to save on bandwidth. And this is kind of where I became involved with what I call like the HTML problem in WordPress, which is we had to identify image tags in the content and read the source attribute and write out a new source set. Many years later on, I kind of ended up hugging a lot of HTML issues that I saw at work and in random community projects like Gutenberg, where we have these regular expressions and we get into a kind of vicious cycle. We write something. We make it work very quickly. We find out that there's a lot of cases that fails to work. So we make it a little more complicated. Then we find out that it doesn't work again. So we go back and we make it a little more complicated. We do this a few times and we get to the end. And our code is so confusing that we can hardly even remember what we were trying to do in the first place. Let alone debug an issue. I'm going to share my screen here. I'm going to be working with a file here called render die. It's a poor name. I apologize for the name, but what it does is it renders and then it calls PHP's die function. And we'll be doing some coding here just to illustrate what's going on. We have a post in the editor with a bunch of HTML. And when we call render die on it, what it does is it prints out the HTML. So this is what WordPress sends to the browser. And it's just a shortcut to opening up the console and coming in and see what HTML produces. So what is often the case that I see, we'll say, okay, content equals p rig match. And we'll say we're going to match an image tag. And then there's going to be a bunch of stuff. And then there's going to be source equals quotes. And then there's going to be more stuff that we match quote, and, and we're done. And then there's going to be content and let's just call p rig replace. And if I did that right. I didn't. I've got the zoom window kind of in the way. What did it do. It looked right that definitely matches the image. But now the entire contents all of my HTML has disappeared. Maybe let's just make sure we got those in the right order pattern replacement subject. Yeah, I got those out of order. So let's try it again. We swap those arguments. We read the docs. We come back. Okay, that works. Now let's go find their image. And I don't see it because we match the whole thing. I don't want to spend too much time on this, but I'm trying to show the process that I always take when I start writing. These expressions. It's a matter of trial and error, and I get everything wrong. And I'm always consulting the documentation and going back. And finding problems. And, and kind of the real trick on this is this stuff. This stuff always breaks. Right here. When we get content that looks like. Image source equals with single quotes. Anybody been bitten by this one before. Or HTML is actually sneaky HTML allows us to, to have unquoted attributes. And this pattern would match. Sometimes we have custom data elements. So a lot of the frameworks, a lot of the visual builders will add things like data. Lazy source equals. A, a larger copy. JPEG. And then they have their source equals a fast, small copy. JPEG. And all of these matches. They always end up over simplifying what's going on. So this one that we wrote here. Because of the star actually matches this data lazy source attribute. And we do other things like we try and match tags. We say, you know, anything but a closing. And then an eventual closer. To try and match the case where we have an image inside of an a tag. And this is a good example I like to use because this, this kind of pattern right here broke Gutenberg once. And it didn't make it into a Gutenberg release. I don't think. Or maybe it did. But the, but the offending HTML had a space here. And we developers when we wrote this pattern, we didn't anticipate the space because we're like, okay, tags follow tags. But HTML is actually happy with enters and tabs and even technically you could have a carriage return without a new line. So it's really, really quite funky. So to get out of those weeds, we won't even cover DOM document. Some of you may have used that. But the idea with DOM document is that we can, we can create a full blown HTML parser in PHP. And this one is even worse than regular expressions because unfortunately, when this part of PHP was written, it was written for PHP or for HTML version four, which hasn't been in use in 20 years. And it was incorrect from the beginning because it was built based off of XML instead of HTML. And it loads the whole post in the memory and it's really slow and it gets really basic things wrong. So, again, I don't want to dwell on all of where we've come from but in case you haven't really dealt with the pain. I just wanted to kind of motivate what we're doing here today. We are coming to the end of an attempt to interact with HTML when we're thinking about characters and symbols and text. We're thinking about quotes, we're thinking about angle brackets, and we have lost the vocabulary that we use when we're speaking with other developers about what we're trying to accomplish. So, in WordPress 6.2, we introduced the first interface in the HTML API. We've got new terms to learn, unfortunately, but these terms have been modeled up of how we talk about what we do with HTML. And the idea is we shouldn't have to be constantly going back and forth to the documentation and second guessing ourselves and finding these edge cases. So instead of talking about characters and quotes and angle brackets, what we're going to do is we're going to adopt HTML terms. We're going to talk about tags. We're going to talk about class names. We're going to talk about attributes. So in this case, what we might do is we might always have to start with the same thing. And we can get into this again later so don't worry that you have to follow everything that's going on. This thing is called the tag processor or more fully the HTML tag processor. And what it's going to do is it's going to allow us to find tags and modify their attributes. So if we wanted to find an image, then we're going to call next tag, and we're going to tell it to find us an image. And then we're going to say the source is get attribute with source. And at this point, what WordPress has done is it has started to load in the HTML. It has found the first image tag in the document, and it has returned the value of the source attribute. Now this value could be one of several things. If we have image source equals a large, a fast image of JPEG, then our source value will be a string in PHP. If it has single quotes, it'll still be a string value whose value is a fast image. If it has no quotes, it'll still be the same string. If it has a, I'm going to mess this up, dash fast image dot JPEG with one of these HTML named character references. It'll still be this string in PHP when we read it. And edge cases we never even thought about when we were developing before with the regexes are going to be unique. Somebody just puts src that we actually get a value of true for this attribute to indicate that the attribute is present. And if we have something like this, where there is no source attribute, our value is null. And again, all we did is we told it, find me an image, get me the attribute. And if we wanted to set it, anybody have a guess at how we're going to tell it to change that attribute? That's right. We're going to call set attribute, we're going to give it a name, and we're going to give it an updated source for our value. And it doesn't matter. You know, if we give it something that will trick our regular expression, if we say an image, an image dot ping, then the HTML API is going to spit out on the image tag. And we don't have to think about escaping. We don't have to worry about escaping. So the HTML API is here in WordPress to allow us to write in our source code, the vocabulary and the terminology that HTML uses to talk about itself, and it handles all the details. We don't have to worry about escaping. We don't have to worry about unescaping. We're always going to follow a simple process. We're going to create the interface. We're going to start querying the document for a given location. And when we get to that given location, we're going to do something with it. And right now in 6.2 and in WordPress 6.3, the tag processor is the only part of the system that's here. We can talk about more to come later. But for this session, I thought it'd be good if we focus on what's here today. And I know we just covered a ton, but I wanted to give us a moment's pause, since we covered so much and give anyone a chance to either just take a breather or ask a question about something we've sprinted through. We do have a couple of questions in the chat. So I'll just read them out from Mattias. Is it using really smart regex inside or was it built with another strategy? This is a great question. The HTML API started with a regex. I was naive. I was on a train, leaving WordCamp Europe, and I thought we could say, okay, let's create a regex that starts here with an angle bracket. You know, and then we get our tag name. And I was so wrong. I was so wrong. So we discovered that in order to parse the HTML document, we have to start at the very beginning and we have to parse every every bit of syntax that could change something. Because, for example, if you have a text area in HTML, then inside of that you have an image tag. Well, that image tag doesn't actually exist. It's considered plain text by the browser. And this is one of those cases where even DOM document gets this wrong, because this should all be treated as text, not as HTML. So there's one regular expression in the HTML API. Let me just jump over to it. My code come up here. Yeah, there's one regular expression and you might laugh when you see it. It is when calling set attribute, we use a regular expression to nag at you if you if you supply a name that isn't isn't valid. But this this one regular expression could actually disappear entirely because it's just a warning. The browsers will handle almost any attribute name you give it. But we wanted to warn and this is the only part of HTML API that deals with Unicode. And we didn't want to hand in code Unicode functions in here. So what we actually have. If you look in the source code, which is linked in the documentation on the class, it all starts with this next tag function. And what happens is, we essentially start at the beginning of the document. And then we jump through looking for angle brackets. So the fun thing about HTML is that nothing in the document changes until you encounter angle brackets. We've implemented kind of the state machine in the HTML specification. And we've, we've simplified where we can for the purposes that we have, and we just basically jump from angle bracket to angle bracket and we say look is this a tag name. If it is, then we've got a tag. If it's not, you know, is it one of these 13 kinds of comments. And you can read the code almost like a book to see what it's doing. So no, it's not built on regular expressions. It's actually, it's directly built from the specification of HTML and how the HTML document describes parsing HTML. We used this 12 megabyte HTML document from that is the HTML five specification for testing during its development. Thanks, Dennis. Matthias has actually responded with impressive exclamation mark. Okay, we have another question from Elliot Richmond. Hi, Elliot. I know Elliot. He, he organizes the Cheltenham meetup here in the UK. He asks, what if there are multiple tags that are the same? I presume the next tag method is the first tag. Are there other methods to get all tags? This is a great question. And we're going to lead into this after the after this questions anyway so now is a great time. You'll notice something that we did here that I did out of habit already. And I wrapped this call to next tag in an if statement. So what's going to happen, let's, let's start looking at real content now. Let's render processor. Get attribute source. Oops. So we took our HTML post we found the image tag, and we rendered the source attribute and we got exactly what was saved. If, on the other hand, we search for a picture element, then nothing prints out. And let's render as HTML, just to give ourselves extra confidence found nothing happened. What's going on is next tag will actually look for a tag based off the query that you pass into it. And if it is able to find another tag in the document, it will return true. And if it's unable, it will return false. So in this case, this whole thing returned false, it didn't find the tag and the body didn't run. This can be important, even though it's safe to call functions on on a tag match that doesn't exist, it's not going to do anything so your code isn't going to do what you're looking for. This also means if we're looking to iterate through all of the images in a document, instead of calling if next tag, we can just change that to while next tag. Because this works as a loop condition, as long as it can find another image tag, it will continue to stop and visit each of those images. So in this case, I think we still only have one image in the document. So if I come back in here, and I duplicate this a few times, and make sure that it saves, reactivate the plugin. I'm not a very robust developer, so if I activate my plugin when I hit save, it breaks. So it has found every image, and it's got other junk out there too. I probably shouldn't do that. Let's do that. So it found all five of those. So yeah, we can find everything. And in fact, again, this pattern is the same initialize the tag processor, give it a query, and when you visit a tag do something with it. I'm going to pull up WPHTML tag processor. The first result, it's the WordPress documentation for it. I'm going to look at that next tag function. I do encourage y'all to go to this documentation because we spent a lot of time developing this to try and find a way to make the documentation guide you into how to use it. Dennis, could you post a link to that in the chat? Happily, thank you. Amazing, thank you. Another class documentation is the section on finding tags. And there's actually a query syntax, a full blown query interface to this next tag function. The most common thing we're going to use is passing a tag name. And in fact, there's a shorthand for that. But we can supply an array to provide more options. The tag name is image, but we can also limit it to tags containing a specific class name or tags. We can actually do one more thing, which is we can visit closing tags, which is not usually need to do. But then if we also jump into this function, wherever it is, let's go down here to the big list of functions, and let's go to next tag. You can see here the query, the query parameters are described. And if we jump back to our code, let's take a look at how that is. Array, tag name, image, if we save this and run it again, nothing has changed because that's the same. But now if we add class name as full width, it's not going to find anything because we didn't have any images with the full width class. I'm not entirely sure what the appropriate classes. I'm just going to sneak a peek here. They put it, you know what they don't put it there to that they put it on the figure. We'll get to that in a minute. We'll just add it. How about that? We'll add one class here is full width. And if we reload our page, we're going to find that it only found the single image. The final way that's really common. I like to think of these as shortcut queries because almost always all we want to do is find a given tag. There's one final way which is not passing anything. And any guesses on what happens if we don't pass any query to the next tag. It visits everything. So in this case, when we run it, you can see there's a lot more information here that we can clear this up every time that you stop at the tag or match a tag. You can call this get tag function and it returns the name of the tag that it stopped. So in this case what we're going to find is here's a listing of every tag in the document. These are all capitalized. That is something that was just a decision we had to make in the HTML API. I'm sorry, the HTML specification talks about them in uppercase, so we do. But it doesn't really matter if you pass an uppercase or a lowercase or a mixed case because in HTML, it's all the same. So those are the little things, you know, a lot of regular expressions. Again, you know, you search for this, and then you miss, you miss that. It's all built in here. Why would you want to visit every tag? Go ahead, Michael. Yeah, so I was just going to say that's fabulous because we have another question. Sorry, did you have more to say on that? I'll wait for the next question. Sorry. I'll wait for the question. Okay, shall I move on to the next question? Yeah. Okay, can the process, this is from Rob. Can the processor find custom elements, template elements inside custom elements? Another great question that leads perfectly into this. I'll finish up what I was saying because it works well. Why would we, why would we call this without any query arguments? Sometimes we want to query more complicated situations. Now, this is one of those things. If you go and you read the recent post on make blog HTML API progress report, make WordPress. Yeah, I've got that link. I'm just posting it now. So I go into some discussion about the limitations of what we can do here because this first interface, this tag processor doesn't know HTML structure. So it's unaware that your images inside of a template. Now we have code in WordPress trunk that will be released with WordPress 6.4. It starts to bring the ability to do that. Like for example, we can say there's a thing we call breadcrumbs. And we can tell, tell it find, oops, find us an image tag that is a direct child of the template. And it will do that. But this, this is not available in the tag processor, but there's some ways we can cheat for the moment. And I kind of urge caution at the moment because the better interface is coming soon. But we can, we can start to build kind of more complicated state machines. We can say in template equals false. And now, as we visit every tag, we can match the tag name and case template. In template, you can see I'm already making programming mistakes. Continue here. That should be, you know, if not in template. But we can also tell it to, this is where I said we can visit closing tags and you probably don't want to. We can. And we can say, processor is closing tag. And negate this. So, you know, once, once we get to a template tag, we're going to, we're going to track it. Okay, we're inside of it and we're going to assume that we're inside of it until we come to the closing tag. And for the case of our image. You know, if in template process. So, this does get kind of tricky. And like I said, what's coming next is something that is also kind of funny when you, when you look at the term, but we've got the tag processor. What's next, the HTML processor. And I know these terms are very close and we couldn't find anything clear and better, but the HTML processor. Is going to not only visit tags, but it provides all of the semantic global processing in HTML, which is really complicated. I actually had some of these cases available to us. This in this example, this really simple example we have an a tag that's not closed. And we have a second a tag. And according to the HTML specification, what happens when we encounter the second a is that it implicitly closes first a tag. So the first a tag extends to here. And the second. The second a tag closes here. And then this is an unexpected closing a tag. And these things are not nested inside of each other and what the tag processor does is it's going to stop at this a tag. It's going to stop at this a tag. And if you tell it to it'll stop at these closing tags, and it's it it's completely entirely focused on tag and attribute. So it's not going to tell you we're inside of a figure it's not going to tell you we're inside of a div. You can track some of this stuff on your own but I encourage you not to because particularly with the templates. If you if you base this parsing off of the idea that tags open and tags close, you're going to get led astray. So this is not an a inside of an a it's actually a sibling to another a the HTML processor will be able to give us that information but for now. I'm going to do some funny things and we've been playing with this in Gutenberg. We've been playing with this at WordPress.com looking for how to use these functions and that's kind of leading how we build this more complicated about higher level interface. Okay, we have another question. What are the performance implications. One of passing the HTML and to relative to traditional methods. Yeah, so this is something we're still gathering data on what I can say is that during the development. I was getting around 20 megabytes per second parsing speed on HTML of random test documents, including the 12 megabyte document. That doesn't sound like a lot, and it's a whole lot slower than a regular expression. But it is been designed to try and be fast enough that we can reliably use this without worrying about speed. One of the things I always like to keep in mind is what is what is the cost of of being right versus the cost of being wrong. So when we came into this situation where we had data lazy source equals something, our regular expressions through like 10 different iterations of code review and patches was still wrong. And we spent developer hours getting it wrong. And, you know, in the case where we had a space in the HTML people were seeing white screens instead of their content. And there's that kind of saying it's always faster to be wrong. It's true here. A single rejects will be significantly faster than the HTML API. The HTML API should be fast enough for you to use without worrying that it's going to slow your site down. And in addition, as we look to some of the complicated things in core and I was working on a refactor just the past couple days in our jetpack plugin. A lot of times we end up with a combination of regular expressions and state machines and code, like we have a bunch of ifs and conditions, it gets really nasty. We can sometimes do multiple passes through document with a regular expression just because we're trying to target different pieces. And there's this funny thing about the HTML API where we start at the beginning and we scan through in a straight line. And there's a lot of places where we can do in a single pass what is done in multiple passes. And if we start looking at the security functions and core, which to this day are still full of plenty of vulnerabilities. Sometimes we're running 100 regular expressions over the entire document. And when I say we're still gathering numbers, it's because it's kind of hard to gauge performance until we can completely replace what cases and formatting and texturizer doing. But I think there's a good chance that in some situations this will be faster, just because it allows us to to proceed linearly and do things once instead of a bunch of times in a hacky way. In the long term, I hope we can get this built into PHP itself, which, which in my tests I should give us another 4x bump in speed. And the final note that's really important here is that this is essentially a zero memory overhead system. So, if you're comparing to DOM document, DOM document is not only slow, but it is memory heavy. The tag processor in the HTML processor almost used, you know, they use so little memory it's not even worth mentioning because if they only start using memory once you start making modifications to the HTML. And every time you jump to the next tag was modifications are fleshed out to the HTML so they get garbage collected anything that was kind of temporary. Thank you. That was the last question we've had so if anybody's got any more questions, please post them in the chat or raise your hand. There's a reactions button in the bottom of your zoom window there. You can raise your hand if you have a question and we can get you on screen asking, or as I said earlier, just type your question into the chat. I'll ask it on your behalf. I'm sure Dennis will be happy to answer. While we wait for that I've got a question myself. What if you want to replace one HTML tag with another one or in certain new one within an existing one how can that be done. Great question. We found an image and we want to transform the image into a picture with a source. This can't be done today. As we look at the HTML tag processor, and we think about jumping from tag to tag. I don't want to provide any means to do this. And this has everything to do with the semantic rules in HTML, where attempting to make a change without the knowledge of the document in the context it's in could break things. Now, I mentioned the text area snippet before the tag processor will not get confused. If you have a text area or a script. I mean, when we search this HTML, it is not going to, it is not going to think this is a picture. But it's not going to be able to replace those tags because it is unable to assert whether doing so would change something. For example, if, if changing the image to source or to picture would leave the rest of the document broken because it doesn't have a closing tag. And that it's unable to confidently do. And this whole API, all of the different interfaces that are being developed for it only proceed cautiously with what they can confidently do. Now in the HTML processor, we will actually be able to do this. Again, this is coming in WordPress 6.4 and it will probably see expansion in 6.5. Let me see. I think this should be working because I'm on, I'm on trunk here. Let's just, let's just be friendly. Yeah, we did that wrong, but that's right. There we go. Okay. So this is a little different. We won't get into it today, but in this case, we're actually going to have as HTML. It may not be merged yet. We're going to have functions get, get outer HTML, set outer HTML. Probably we're going to have replace tag. And we're going to have replace HTML. So these interfaces will come and they're going to do what you want to do. I think we're also going to have a wrap with. So you could say wrap with picture. And we're also going to have another method on this class that is going to say unwrap, which will just take whatever element surrounds it and pulls it out. And the cool thing about what's coming soon is that in these cases where we have weird things going on with the HTML. It preserves the structure of the document when we perform these changes. Tag processor cannot do this because it is so focused on tags and attributes. It makes it so much simpler and gives it predictable performance. When we start to take that structure into consideration, we have to realize that, you know, if you have an HTML document that has a thousand nested lists, that will have an impact on performance more than if you have a document with one level of list. So these are the kind of things that we're stewing on and, you know, we have to, we have to either be patient or fall back to the methods we're comfortable with in the meantime and jump in the HTML API MetaChannel in Slack and share our input. Hey, I'm trying to replace this tag. Help me out here. I'd like to use this new API. How can I do it? So did you say that the HTML processor, the coming iteration of this will actually preserve the HTML structure? So if you have mismatched closing tags, it'll actually preserve that. Yeah. Another great question. Also, I see Elliot raising his raising hand. If you look, if you really get curious and you look through some of the unit tests in WordPress for the HTML processor, what you're going to find is a lot of broken HTML. The whole API, we kind of designed as prioritizing what I like to call garbage in garbage out in this context, which is, if there's something broken about the HTML, but it doesn't affect us, we're going to leave it. We're going to leave it in place. And in the tag processor, I just recently, or I just briefly mentioned that we skipped some of the parsing rules that we were allowed to. These are things like text encoding issues, or these are things like you can actually have, pardon me, you can have null bytes that are supposed to be converted into the Unicode replacement character. Don't ask me why this is on my most recent or my most frequently used characters here. A little question mark. So when HTML encounters this, it's supposed to replace a null byte with that character. We don't do that because the browser is going to repair a structure when it gets there. We may not match on it because there's no legitimate way to match on that in HTML. But we leave things in a broken state in as much as they don't affect us. And I'm looking for some of these tags. Now that I think about it, it's probably in another, it's probably in an open patch. Here's a good example where there's buttons, closing buttons, and I don't have what I thought I had. I apologize. But yes, it will leave the tags that were there that looked like they were broken because as long as the structure is preserved, it's happy. Because what we're doing is we're jumping into a spot in the document, making a change and moving on. So any malformed HTML will be preserved in the processed output. Yeah. We're not going to fix it, but we're also not going to break it more. Yeah. Okay, Elliot has a question. He's raised his hand. So I'm going to invite him to unmute and hopefully come on screen to ask his question. I've unmuted but I don't know if I can get him on the screen. No, that's fine. My screen is in a different place and my camera is in a different place. I look a bit old. We can see you. Yeah. Now my question was, will this work with HTML comments? And I'm asking because, yes. So HTML, so we can move blocks around if we want to. Well, I actually, I have to qualify this. What it does is it's over comments and we have, we have a PR open. That's kind of just been sitting there since 6.2 was released. We kind of got distracted. There's no way to stop at a comment, but I think we want to add that in. And so right now, when I say it works with comments, it just jumps over them in their entirety. We're going to be focused on those tags. We will probably add a mode to visit a comment. But right now, you're not going to be able to rearrange those for the same reason that you can't replace the tag or change the contents. The HTML API will be able to do that. Sorry, the HTML processor will be able to do that, but the tag processor can't. In fact, there's never a point in time where string indexes are exposed where we could change things around. Although if your, if your goal is looking to rearrange blocks, then that's actually faster to do through WordPress with parse blocks and serialize blocks. And that can be done today. It should be a good thing to ask about in the forum somewhere in Slack. The reason I asked is because I got involved with a bug that was to do with the create block theme. And there was an issue with, when you save, when you try and save a pattern that's got an image, and it sometimes it breaks wherever you put the header and that's because I don't remember fully what why did it but I suggested that you could probably pull out and reorder them somehow in, you know, with an index and then specifically get the, I think it was a slug name, because it was stripping out the slug name and then it was breaking the headers and footers and stuff. So I wondered if this could be used for that purpose, because if it's part of call then obviously it would be useful to use that I guess but. But yeah, it's an interesting idea. Go ahead. No, I was going to say, if it's not quite there yet then, I guess, yeah, it probably requires a discussion maybe. That's an interesting idea we'll take a look at. There's one thing this is in its own HTML API folder and core and one of the, one of the things that has kind of defined it is it's focused solely on HTML itself. We've actually wondered about finding it more closely into WordPress because right now it doesn't handle short codes it doesn't handle blocks it doesn't handle any of that. And according to HTML, a comment is semantically neutral. And it's kind of funny there's multiple comments there's the comments we're all familiar with. But there's also this is a comment and there's there's broken syntax. This, this is also a comment this is a comment. And all sorts of crazy things. And what it does, including for C data. Is it just, it just hops over it. And then the same thing it does for script tags, like it doesn't, there's no reason to jump inside a script tag so it just, it just skips it. Comments script tags, the contents of a text area or the title element title is a special HTML element it's just like text area, there are no tags inside of it everything is plain text. Yeah. We're looking at that because we're looking at powering some little dynamic tokens in WordPress, kind of like a short code again but we're looking at powering that through some other comments themselves. Okay, cool. Thank you. Thank you for the presentation as well by the way it's great. Really interesting. Thank you. Thanks Dennis. We've got and thank you Elliot for that question. We've got one more question from Rob. Will the HTML parser be block aware. In other words, will it read and write the Jason in the blocks HTML comments. I guess he means the comment did the blocks comment to the meter. Yeah, there's, there's actually no plans right now to bridge the block and HTML processing. And I would say that's mostly because I'm nervous about the implications of coupling those two systems together. What we have right now is a fairly pure HTML system. Again, we can talk about HTML on its terms and we don't have to worry about these other things. So what I would expect to find is if we want to start processing block material, we'll find a feature a functioning core that is built on top of the HTML processor. So all of these things that do stuff, you know the HTML API is kind of the workhorse that provides the engine to do these things in a clean and elegant way. The functions that work with WordPress are going to be incorporating the HTML API to do what they want to do. That'll keep those things separated and it'll, it'll allow us to continue to, you know, we haven't had hardly any bugs with this so far, but like when you do come up with a bug, it means we can fix it and only have to think about HTML. We fix separate HTML, we fix every single function that works on this. It's one of the things I like to draw in comparison to regex is like a fix here affects everybody's code but an attribute is always going to be an attribute. So by using this API, we can, the code we write can remain solid forever because it only does one thing. Okay, great. Thanks, Dennis. We're at the top of the hour now. I just want to ask quickly, where is development work if people want to contribute where is development work on this happening. Great question. I would encourage anybody who's interested in developing to jump into the HTML API meta channel in WordPress Slack. There's also an HTML API component when press HTML API component in track. These are also linked in the, in the progress report post. Let's jump back to that. At the bottom of the progress report, there's some links. And if you click on track tickets and HTML API component, I think it takes you to, that's the list of tickets. And it's not loading component overview page. That's what we wanted. So we have, we have this, this area where you can follow the tickets. I think there's a button or link right here to click to follow. What's this in chat? Okay, but in, in short, or the, all the necessary links for people who are interested are in that progress report post. That's correct. Awesome. Okay, let's, let's wrap things up now. As somebody asked earlier, can you save the chat? Yes, you can. There's lots of links been posted. And so you, it's a good idea to save the chat if you want to follow the links a bit later. If you look below the chat, you've got a number of buttons, the right most button, the three dots for more, you've got an option there to save the chat. So you've got a minute or so while I wrap up to do that and make sure you've got the chat locally. Reminder also that the video is going to be posted to WordPress TV and YouTube. Give me a bit of time to do that. Hopefully they'll both be there by in 24 hours by this time tomorrow. Another reminder that WordPress is open source. If you want to contribute to WordPress, maybe not just to the HTML API, but I'm sure Dennis will be happy to receive your contributions there. But go to what make.wordpress.org is where all the activity for WordPress. The community based WordPress happens so you can contribute there. Next developer hours is going to be on Wednesday the 27th of September. It's going to be one hour earlier than usual at 1400 UTC due to a scheduling conflict. I'm not going to be hosting that one. I'm going to be on holiday that week. So Nick Diego and Ryan Welcher will be participating in that one. So that's Wednesday 27th of September at 1400 UTC for the next developer hours. It remains to thank you all for joining this session and to especially thank Dennis for his presentation and answering all the questions and educating us on the HTML API. Thank you very much Dennis. Thank you everybody. Okay, thank you all and goodbye.