 I'm Gabriel, and with me is Subu. Hi. And we are both working on Prostate, and we're going to give you a short overview of some Prostate internals. Subu, can you put up the slides? OK, let's start from the beginning. There you go. Can you see it? Yes, we can. Awesome. Can you go forward? So we will start with an introduction. And explain to you what makes this whole thing difficult. And then Subu will go into the details of how Prostate is structured, the architecture, explain how some of the algorithms work, and give you basically the detail, the actual content. So I'll just give you a quick introduction. Prostate is a web service that converts between Wikitext and HTML5 and now RFA. The new thing is that this is actually something that has a stack, and we have all the capture of the semantic information using RFA. That makes it pretty easy to work with and enables a lot of other things. So that's the main stack we are hoping to achieve, that you can do something that is nice to work with. But the interesting thing is that it does that in both directions. And it's written in JavaScript, runs on Node.js, pretty fast, and has a very simple API, a rest API. So the simple thing if you want to convert Wikitext HTML is just to post something, you have a WP for Wikitext, string and get back an HTML string. Similar in reverse, when you want to convert HTML to Wikitext, you send HTML, get back Wikitext. For retrieving the HTML for a page, you actually just do a get for the page name, and you get back the full HTML for that page, so the way you would expect it really. You can see the documentation there in the link, but it's really not much more than that. We also have some command line tools that we use mostly for our own debugging purposes, but you could also, if you wanted to incorporate some other tool, use this API to convert things as shell scripts or whatever. Yeah, then who uses this? We have a lot of users now that started out. May I interrupt now with a question? Sure. On the previous slide, with the command line options, could you show it to me? Thank you. Under what circumstances would you want HTML to HTML or Wikitext to Wikitext? When you're testing. You're testing the full round trip, and you want to know that quite well, so you want to actually compare it. The input is, again, the output. It should be identical, correct? Yeah. Thank you. So we have a few users now. The visual editor was our first user, and we actually drove this project in a serve. And now we have the user flow project, which is a discussion system for me. We keep using it to provide Wikitext editing and incorporate HTML discussions. So they already store the content in HTML, and we are just providing the Wikitext editing interface for it. So it's kind of used in reverse. There's also PDF rendering, mobile, it's starting to use it in Qwix, it's providing offline copies of Wikipedia. And they all rely on the well-defined HTML and RDFA interface and spec that to make it easy to transform the content and massage it in various ways. And there's an upcoming content translation project, which is translating the HTML, because that's much easier to work with than with the text again, and then converts it back to Wikitext, for example. So it lets us move articles from the English Wikipedia to smaller ones and start them up much more quickly that way. And there's now also some gadgets and bots that are being developed as an ad protection helper. But that's pretty well used though, so that's some users there. You can use animal users on that link Wikipedia page. All right, so now what makes this hard? The basic issue is that it's not a one-to-one mapping between Wikitext and HTML. Many different types of Wikitext. So for example, in this example, it's just true with the space between the King Star and the crew and without the space, they both map to a list item in HTML. So it's two different Wikitext inputs, not to the same HTML. The same is true in reverse, where there's more examples there with the ref, code styles, capital, lower case, admin, and so on. That's a huge variety. The same is true in reverse though. So it has HTML that can serialize it to different kinds of Wikitext. So we can serialize some things as Wikitext syntax, and some as HTML syntax, for example. And our challenge is that we have to roundship things when we start with Wikitext, we have to roundship things back to the same Wikitext because otherwise it would show up as a diff in the normal diff interface. And that would irritate people and would make it hard to figure out what was actually the intended change. Next slide, please. Did you see this next one? Yes. Yes, that's great. So Media Weekly has also a lot of complex functionality. That's not really something that makes parsing very hard, but it's just functionality that we have to implement. And there's a lot of corner cases that we're finding out about as we implement things, especially in images. We've found a lot of edge cases, and cleaning those up in the PHP part as well. Next slide, please. So the templates in Media Weekly are purely spring-based. So they don't actually expand to something that has a meaning in DOM. It doesn't have to be a subtree. It can be something probably unbalanced, like in this example where the actual template emits just a start-up, that is then wrapping all the content that is not templated. And if you want to, first of all, to build the DOM, you need to expand all these templates because it's very spring-based. And then we have to solve a problem on how to represent this in a visual editor and to other tools. So we have to establish which content is somehow affected by a transformation. Performance is also an issue with some pages having a lot of institutions. If we did all this sequentially instead of parallel, it would be very slow. So there's a lot of need for parallelism to make this perform well. There's also no invalid wiki text. There can be any kind of nesting issues. There can be text in places where you would normally expect an attribute, like in this every row. This text has dropped, so it should be positive drops if we drop it. But on round trip, we still have to preserve this. And the next example is something we spend a lot of time on. This is Foster Perrin, where something happens between the table and the row. Mark, will you mute, please? Hi. Hi. Well, so that's text between the table tag and the table row tag. If you just let it go, yeah. So the problem is that this text is going to be moved out when it's a prospecting HTML part, so it will end up before the table tag. And we have to deal with this when we build it on, and we have to undo this to avoid creating the reduce. It may smash through itself, or something may be digest. Overlapping tags are cleaned up by the tree builder, and we basically have to undo all these pixels. To avoid the reduce. Right, that's some, hopefully gives you some idea of why this is hard. All right, I have a quick question. When we were embarking on this, did we think about doing something like just having a one-time cleanup of all of the Wiki text on all of our sites so that all of the Wiki text would be pretty clean and in a consistent style, and wouldn't have these kinds of problems to make our own jobs easier with ParseWood? Well, we always had the need to run Wiki text editing and HTML editing in parallel for quite a while, while we developed this. So we couldn't just say, oh, that's this flag day, where we switched from one to the other. We could have maybe, but then we would have to do it find the future. We could not have started by we did, but we would let it go now as an option. OK, thank you. So now, Silco is going to present on the details. OK, in the next part of this talk, I'm going to give you some details of how ParseWood handles Wiki text and converts that to HTML. So this diagram here gives you a very high-level overview of the components of the ParseSync pipeline. So I've roughly shown here six stages. So you start with Wiki text, and it goes through a tokenizer. I'm not going to talk about a lot of details in this slide, but just an overview here. So the tokenizer converts its Wiki text to a bunch of tokens, so for example, strings, comments, headings, unexpanded template tokens, unexpanded link tokens, et cetera. And this token stream is then processed by a bunch of stages where things like templates are expanded, extensions are expanded, link tokens are converted to the ahref tag, lists, codes, et cetera, they're all generated. Now, in this part of the pipeline, we actually query the MediaWiki API, because at some point we decided we didn't want to re-implement a lot of the nasty details of, say, ParseWood functions, magic words, extension hooks, or all the various ways extensions are plugging into the ParseWood pipeline, or even Scribuntu, for example. So in this part of the pipeline, we issue queries to the MediaWiki API, for example, say a template text, and we get back a bunch of Wiki text, which at the end of these three stages, what you get is a whole bunch of expanded tokens, which is pretty much ready for HTML. So this then goes through a standard tree builder, and you get a DOM, at which point it's still not ready for use by clients. So things like citations, refs, and references have to be generated. And Gabriel talked earlier about templates, so the scope of templates have to be established, and things like that. And so this diagram shows you the full pipeline. But within ParseWood, we actually use different sub pipelines for other tasks. So the idea, I mean, the reason I'm showing you this is that we had to design this in a modular fashion so that we can generate other sub pipelines for use within ParseWood. And this is the only slide in this talk where we are going to talk about the other direction, where we start with HTML and Wiki text, because we can't really cover all the details in this talk. But briefly, what happens is, let's say a visual editor edits this, and you get a new HTML. And ParseWood compares this new HTML with the original one, and finds out what has changed. And we use a strategy called a selective serialization, where if something has not really been modified, then we just emit the original Wiki text from the source page. And if something has been modified, then it goes through full serialization. And I'm going to talk a bit later on this process of detecting what's unmodified. And in order to make sure that we emit just the exact Wiki text from the source, we need to be able to very accurately map Wiki text substrings to the corresponding HTML that generated it. So let's quickly look at the first stage, the tokenizer. This uses what is called a peg parser. It's a parsing expression grammar. And I won't go into details of what makes us different from other grammars. But the important part here is we use this parser to parse the context-free aspects of the syntax. So for example, strings, or template tokens, or comments, headings, or recognizing list beginnings, things like that. It's not really possible to parse everything to final output. There have been a lot of, I think, projects that tried that. It's not possible because the grammar is context-sensitive. So for example, on the line, if you have a bunch of quotes, whether that's going to be an I tag or a B tag depends on what else showed up on the line. And transclusions, since we don't pre-process them, transclusions also change how wikitex is parsed. So as in the previous and the other diagram, so token string transformations and DOMs passes in the later stages handle all these context-sensitive parts. So let's look at an example, which makes it easier to comprehend this. So I'm showing there a bunch of wikitex, which is fairly simple. You have two lines. You have a string, and on the second line you have one element list, and where the content of the list comes from a template. So once it goes through the tokenizer, you see a whole bunch of tokens that are generated. So you have a string, a, followed by a new line token. After that, you see a list token, which is a bullet token. So for example, if you used a hash, that would be a list token with a hash there. It's followed by a template token, which indicates that with information about the template name, in this case echo, and the parameter in the case b. And after that, there's a new line and an end of file. I also show you the command line there. For those interested, you can run it. And I'll show these command lines in a couple of other slides, but you can ignore this for the talk. So once this goes through the middle stage, stages one, two, three, you see that what started off as a bunch of unexpanded tokens, like the list token and the template token, are now fully expanded. So the string a is now wrapped inside a paragraph. And then the list token, you see, it's completely generated into UL and LI. And there is a template start marker and a template end marker, which bounds the content of the template. And yeah, the other things are explanatory. I mean, they're simple. So once we build the DOM and run a couple of passes, this is how it looks like. So you have, once again, a paragraph tag. And you see a list there. And you have the list start marker and the end marker. And the thing to note here is that the meta tags have these two attributes, type off and the about ID. These are the RDF annotations which will persist through the DOM. And which are used by clients like the GitX Editor, I mean, sorry, the Visual Editor, to know what part of the content is template so that it can appropriately provide the right editing environment. And we have a spec which lists all the various type annotations for things like links, extensions, categories, redirects, et cetera. And this is the final HTML once it's parsed. So you see that the content of the template B is now wrapped in a span, which has the same attribute, the ID, and the type off. And I also show further up there for the paragraph, the UL, and the LI tags, something called the DSR information. So this is the information that is used later on by the selective serializer. So what this indicates is that, for example, that the paragraph here A came from Wikitext substring 0 through 1. And in this case, the list came from the substring 2 through 13 of the original Wikitext. And the template itself is also wrapped in the span. And let's look a little bit more closely at the attribute data MW there. So in this case, this tells the clients. In this case, for example, Visual Editor, what are the various parts of the translation? So there is only one part, which is the template, whose target is the echo template, which is fully resolved. The URL is there. And it has a single parameter whose Wikitext is the string B. We also have work in progress where we are going to provide Visual Editor and other clients with also the HTML representation of the parameter. So they can provide a rich editing environment for even the template arguments. So in this case, it's just a string. So HTML is no different, it's still the string B. But imagine that the Wikitext changed, where the argument is a link foo, then you'll see that the HTML is a full A link with the href and all the information. So am I going too fast, or is this the right pace? In terms of am I comprehensible? I think I have a question or two, if you don't mind. Sure. OK. One is in the HTML, the special attributes that you throw into things like the meta and span tags, are those, you could even go back, oh, great. Yeah, thanks. Like about, type of, and so on, things like that. Am I right in presuming that those are basically our Wikimedia custom inventions, or are there any of these where there's actually a standard industry or W3C meaning in the HTML standard? No, the actual values, the type of, I mean, the things like MW transclusion are external link or wiki link. That's something that is internal to Wikimedia. But the spec of providing a type of, the about, is something called RDF. So that's a standard. How do these things end? Yeah. OK, cool, yeah. And then remind me, what does DSR stand for? It's something called DOM source range. So what part of the DOM came under source range in the Wikitext for a DOM node? OK. And I'm going to talk in detail later about that, yeah. OK, you're going to talk more about that. All right, thanks. OK, where was I? OK, let's look at the other stages now a little bit more. So the middle stage is the three stages where I said all the expansion happened. So this is where the Wikitext tokens come in. And first of all, all the things like no include or include only include are processed. And once that is done, the second stage is where a lot of the expansion takes place. So this stage where I show in bold, this is all asynchronous. So multiple template transclusions are processed in parallel. The MediaWiki API is queried for things like whether an image exists or not, the attributes, the size, things like that. So again, once again, let's look at the example. I still use the same Wikitext from before. I just added a no include there. So what you get from the tokenizer is the same as before. You have a list token, and there's a new line there. After that, you have the opening no include, and once again, a template in between at a no include. And once the first stage is done, all the no includes are stripped. But as Gabriel talked earlier, a lot of the time part of it has a strip information. But since we want to roundtrip the Wikitext back to how it was, we have to keep a placeholder there so that we can emit the no include back once we convert the HTML to Wikitext. So that's what the placeholder is there for. So now let's look at what happens to the template token there. So first of all, we query the MediaWiki API to give us the expanded Wikitext for the template. In this case, it's just a simple transclusion. And the API tells us it's just a string called B. And this string that we get back from Wikitext is parsed in a completely new pipeline. So as you can imagine, the tokenizer tells us it's just the string two tokens, the string B, and the end of file. So this set of tokens are taken. And you wrap them around with template start and template end marker tokens. And you have IDs one in this case. So if you have multiple transclusions, which most pages have, then you get unique IDs for every transclusion. And this new set of tokens are now spliced back into the main token stream. So where you had the template echo B there in here, in place you get the template start, B, and the template end. The thing to note here is that all of this is not synchronous process. That is, once a parsed issues a query to the API, it's not sitting around waiting for the response to come back. It's continuing with other processing. So for example, if you look at an edited Wikitext where you have a couple of more transclusions and a second echo on the Infobox template, then all these three template tokens are processed concurrently. And we issue requests to the MediaViki API. And we have some buffering within Parsoid, which makes sure that the tokens are spliced back in the same order in which they showed up in the original Wikitext. And so, OK, let's see. Let's look at the next stage three, where all the transformations are taking place, like code, list, indent-free, paragraphs, and the sanitizer. So all the stage three transformations run, as indicated after all the templates and extensions have been expanded. And one thing I forgot to mention is that in this stage, we also handle things like links and images. And if there are attributes of extensions or links and images which come from templates, that's also expanded in the same fashion. So the code handler is more or less a straight port of the PHP parser's handler. It's a whole bunch of conditions. And there's a lot of detail I'm not going through here. Similarly, the sanitizer also is a straight port of the PHP sanitizer. So this guarantees that all the unsafe Wikitext and HTML sanitation that is happening in the current PHP parser also happens in parseride. And as far as the list, indent-free and paragraph handlers go, they are simple state machines which transform the token stream. So if you see one token, what should you do? Things like that. I'm just going to take a simple slice of, let's say, the indent-free handler to explain what this means. So a state machine basically has a bunch of states. So it's a start in some state. And when you see a token, you transition to a different state. And when you do that, you take an action. So in this case, you start, you solve, in this case, start of line. If you're in start of line state and you see a new line, then nothing changes. You're still at the start of line. So you just emit the token. Similarly, if you have a start of line transparent token, things like comments, no include, categories, then you still don't change the state. You still are in a start of line context. You just buffer the token. So that's the action that is taken in the handler. If you see white space, things like space are a tab, then that indicates that you have to generate a pre. So this is the indent-free. So that is when the state machine goes into a pre-state and you buffer the tokens to read the rest of the line. And if you see something else, then nothing needs to be done. You're going to ignore state till the end of the line and you emit the token into the token stream. Similarly, if you're in pre-state, you have a whole bunch of states. So I'm not going to details of all the states here. The basic idea is fairly simple. All the transformations are kind of, if you're in this state and you see the token, what do you do? And this kind of lets us kind of handle this in a systematic fashion. And a lot of the complexity here is mostly in getting the details right and the edge cases of handling, dealing with paragraphs and the pre's and things like that, single line, multi-line and all of that. So the basic idea is simple. It's just a lot of details. So I'm going to pause for a second if anybody has questions, otherwise I'll continue. Okay. Okay. So the last part of the pipeline are actually the stage before the last one is building the HTML. So this uses a standard HTML tree library. This is not something that we build in Parthoid itself. And the HTML5 has a very well-defined W3C spec which describes how the HTML has to be built. And so we just use a library that builds HTML from tokens. And the one thing to note that we have to use a lot of tricks to direct fix up of misnested tags. So in this example, so you have this bad nesting where the I tag and the B tag are overlapping. So once this goes through the HTML tree builder, this gets fixed up. So the B tag gets, and there is an empty B tag there. So once again, this is a problem for Parthoid because if you serialize this as is, then you essentially change the wiki text that Parthoid serializes to and you're introducing dirt data. So we have to use a lot of tricks and to detect what has been changed as part of standard HTML fix up. And to do that, we just add a whole bunch of shadow tokens for every token seen in the original stream. And then we analyze the DOM that we get from the HTML tree builder. So for example, if there is a node in the DOM, but there is no shadow token, then we know that the node was something that was added by the HTML builder and didn't show up in the token stream. And so we kind of marked that node as something that was a fix up and not something that showed up in wiki text. So the thing to note here is also is that the PHP parser itself does not generate well-formed HTML. It relies on a different library called tidy to clean up the misnested tags. And tidy is not using HTML5 semantics. So what this means in practice is the output of parser and PHP parser will differ occasionally when we run on misnested tags. So for example, we have a whole bunch of bug reports right now where formatting tags are misnested and parser and tidy handles it differently. It's something that we'll have to kind of tackle in the coming months. So the HTML that comes out of the tree builder while it is a well-formed DOM is not fully ready for use by clients. So we still have to do a whole bunch of fix up. For example, we have to find out what part of the DOM is a template. And we have to kind of add citations. And we have to deal with things like link trails and link prefixes for those of you know who what that is in wiki text. The reason we do a lot of this on the DOM is because a DOM is a well-structured, it's a tree structure and it makes our algorithms fairly simple, simple in quotes and also robust. And so we have a bunch of passes with transformed the DOM. So I'm just showing you a subset of those passes and I'm only gonna talk about three of these which are shown in black here. So the first thing that happens is we have to detect a fostered content. And as Gabriel mentioned in one of the early slides, so fostered content is something that is badly nested in a table and has to and gets pulled out of the table outside before the table. So part side will have to kind of detect that and undo the change later on. And I already talked about also detecting the ways that HTML5 got fixed or the HTML got fixed up so that we can again undo that change later on. And the next is mapping Wikitech substring to DOM sub trees. This is the DSR computation I mentioned earlier. This is required so that our serializer can very accurately take a DOM node and if it's been unmodified, just take these offsets and just emit a substring from the original source. So we don't have to deal with derivatives in that case. And the next most important part is also to demarcate template scopes. So marking what part of the DOM came from templates and what did not. And there are a bunch of other passes dealing with link prefixes and trails, generating references, and there are a bunch of hacks that we have to worry about called the LI hack. I won't go into detail there. I can talk about it later if anybody is interested. And also supporting table cell and table cell and table attributes which come from templates. This makes it quite complex also. So that's the reason we kind of do it on the DOM as a post-processing step. So let's look at fostered content. So the name is kind of a strange name. You might wonder what is fostered content? This is something that's part of the HTML spec. It's not something that's native to Parsehide and you can test this in your browser as well. So basically anything that's in a table that's badly nested is moved out and it gets adopted by fostered parent outside the table because earlier, for example, in this case, the string foo should not be there between a table and a TR tag. So earlier foo had the table for its parent and after fostered parenting, it just gets a different node as its parent and hence it's called a fostered parenting algorithm. And in a table, you can only have content inside a TD tags, that's the table cell or table heading tags and everything else is not valid. And one of the challenges that we've run into a lot of edge cases is where only part of a template content gets moved out. So you have part of the template inside the table and part of the template which is outside the table. So which also changes around the ordering of the template itself. So which gives us quite a bit of headache, but we've kind of managed to kind of figure out how to handle that. So the reason this is a problem is because, as I mentioned earlier, this breaks content ordering. Before we fix this in part side, what would happen is you had this wiki.txt on the left where you had fostered content showing up in a table row wiki.txt tag and it would essentially round trip differently. It also interferes with our ability to map wiki.txt strings to generated DOM nodes because once again, the ordering has changed and which means your offsets are all different. And if you have template content getting split up that also messes up with our scoping algorithm. It used to cause a lot more serious corruption early on where we had whole bunch of whole sections of pages getting duplicated when they were serialized. Before we realized what was going on and added fixes for this. So the basic idea behind solving this is very simple. Essentially you add a marker before every table. Essentially create a fostered box between the marker and the table which means anything that shows up between the two tags, the marker tag and the table is fostered content. So once we find this, later DOM passes essentially ignore all the fostered content in that analysis. And the serializer also relies on these markers to avoid corruption. And of course there are a whole bunch of edge cases like in most of the other transformations. Okay, is this any questions so far? Just double checking. So fostered content is only tables or other formatting types as well? It's only tables. Yeah, it's only tables. Okay. Okay, next let's look at what DSR computation is. So DSR is just a DOM source range which is basically tells us what a Wikitech substring in maps to what DOM node? It's assigned to every node. It's a four tuple. It has that's four numbers. There's a start, there's an end and the tag widths for the start and end Wikitechs tags. So if you had foo there with the two quotes there which will pass to this itag, then the DSR for that is shown there. It is the start offset is zero and the end offset is seven. And since both the quotes are two characters wide each that's the tag width. Instead if you had this Wikitech as on the second line which is basically raw HTML that will pass back to the same HTML as before except the DSR is different. In this case the end tag is 10 because that's how wide the string is. And the starting tag is three characters that is for the opening itag and the end tag with this four characters. So the accuracy is very critical for selective serialization because it essentially emits a substring of the source Wikitech for unmodified DOM nodes. And I won't go into the details here but it's very the basic idea is you take the DOM and you walk, you start at the body and you walk backwards and keep computing the offsets based on some information you have about Wikitech. So for example we know that itags have two characters are two characters wide, B tags are three characters wide. We know that headings are one to six characters and things like that. And it also uses information that the tokenizers gives this pass in terms of the start and end offsets. So there again it's a fairly simple algorithm where it goes in the reverse direction and just passes some information back and forth. Okay, lastly, how do we detect templates? The reason this is important first of all is because for clients like the visual editor, you cannot really edit template output directly. You can only edit the parameters of translations which means the actual part of the document which comes from a template has to be edit protected. So for common transcription scenarios like say the simple one in this example where there's just a echo foo, it maps to a DOM node which is a span with content foo. But this is not true in the general case. So for example, there are a whole things like succession box templates, there are table start and table row templates where a whole bunch of Wikitechs where a bunch of multiple translations as in the case S start, a bunch of Wikitechs in the middle and S end, this whole section of Wikitext maps to a single table, a DOM tree. So essentially this pass what it does is associates a bunch of DOM nodes with the corresponding Wikitext as in this example. So that is the basic idea behind the template encapsulation or scoping. And what this also does is add this RDF information which I already showed in an example early on. So it adds the ID attribute, it adds a type of and also the data MW which is basically information about the parameters of templates. The algorithm is fairly, the basic idea is fairly simple again. So I mentioned earlier that the tokenize the template expansion transformation adds a template start and end markers. And so if you look at this tree as the DOM, then we essentially are looking for these blue nodes which are the template start and end. And we walk up from both ends to find a common ancestor. In this case that's marked here, which is the black node. And then the essentially the entire subtree of that ancestor is what is a template content. That's the range which shown in brown there. So that's essentially what this algorithm is doing. And the essential detail is that sometimes when we do that this template ranges overlap and they nest. So in this second example where we have three different templates shown in blue and marked as one, two, and three, we find that if you compute the range for two and three the range for three is essentially nested completely in the range for the template two. So this is one of the details that we have to handle where we have to kind of merge overlapping and nesting ranges. And for every range we set up also the about ID type of data and W. Any questions? Nope. Okay. So that's the end of all the details that I'm gonna talk about. So to summarize, the process of generating HTML from Wikitex is a fairly involved process. And the reason this makes it challenging besides the reasons that Gabriel mentioned are two things. One is the round tripping and also the fact that we had to support editing on the HTML. A lot of the individual algorithm solutions are mostly straightforward. And the reason it is complex is because there are a lot of these components and there are a lot of details and edge cases to get right. Okay, that's it from me for now. And if you have any questions, Gabriel and I can take them. I haven't received any questions from my RC. Ladies and gentlemen. Sure, it sounds like I should, I'll give a second to anybody else. Chris, you haven't asked anything. Do you have anything? I don't know. Okay, great. So this is a bit of a detail, but I was just curious about this bit. So during the step where you're parsing and convert HTML, the context and sensitive bits, each of the templates gets a unique ID, right? Like TPL1, TPL2, and so I was wondering if that's asynchronous. How is that, how are you making sure they each get a unique ID? I mean, we have information when we are parsing so for every template there's a token that the token generates, so we just count off the tokens and add the IDs. Oh, it's counting off the tokens. Okay, all right. Is this related to the token buffering that you mentioned then? No, no, no. It happens early on in the tokenizer. It happens before that, I thought so, okay. You see, they tend to kick off synchronously, but then the return sub-process is synchronously. So you continue kicking off request to the FBI and you don't wait for it to finish, but at that point you can just increment it counter, so you have an ID. Okay, the ID is from the kick off after all, not the processing itself. Okay, thank you. It's like the job number almost. Yeah. Yeah. Sweet. I guess you're right, I was thinking that everything was being kicked off simultaneously as well, so no. That's so sweet. You have to get to the token to actually see there's something to do then. Yeah. All right, well, it looks like this is everything. This was super useful to me. I feel a zillion percent more, like I understand what's going on inside Parsoid, and I really respect, it's a tremendous effort. Thank you so much for doing it and for putting this together. Thank you.