 Today, we're going to talk a little bit about Rout Recognizer. Rout Recognizer is this really, really low-level portion of Ember, and it's incredibly intimidating. It's the foundations of what it is we're building inside of Ember. But here's the thing. When you go start trying to swing at foundations and make changes and make sure that you understand what's going on, you actually are going to feel like you might do this. And then all of a sudden, you have to actually change the height of the columns on one side of the building to kind of counter for it. And then you notice that it continues to sink more as you keep building it. And then it's just a whole can of worms, and it's a really bad situation. And so swinging at foundations is a little intimidating. But we're going to actually walk through how to go through this process so that you have a good sense of what Rout Recognizer is, what we've done inside of Ember. And how to plan for the future so that we don't end up with this because we don't want to have ruins at the end of the day. And so it's not magic. It's actually relatively straightforward and is just involving a little bit of computer science. So for those of you who are terribly scared of that, I dropped out of college. And if I can do this, you can do it. I assure you. So moving on, all we need is a little bit of data. That's basically the only thing that a URL is. It's a bit of data. We have all of this information that we need to put in one location so that when you press refresh, it can pull it right back out of that URL and do something with it. And so inside of Ember, we have some sort of state object. It looks something like this. It can be literally anything you want it to be. But we just need to have some sort of information that says, when you go to this page, draw something on the screen and pull this information out from somewhere. Well, in this case, we're going to pull the information out of some state object. We could actually cheat here and make this really, really easy. What if we just took it and shoved that whole object into the URL? Done. All right, so have a good night. All right, so not actually that quickly. But we're pretty close. We've got a full serialization of our actual state that we wanted to keep track of. And it's in the URL. All we need to actually make that work is these two single lines. That's serialization and deserialization. That is the basic premise in everything that you need to understand about what we're doing inside of Route Recognizer. We need to take some state. We need to serialize it to a URL. Then we need to be able to, once we get that state in the URL, deserialize it back into a object in memory. Fortunately, JavaScript ships with a serializer and a deserializer for you. It's called JSON parse. And JSON stringify because somebody thought that was a great function name. I'm looking at you, Douglas Crockford. So quick, easy, fast, incredibly performant because it delegates down to native code. But you end up with that kind of mess. That's not fun. So let's talk about actually building a URL. So you don't need just data. Turns out we also need a user interface. We need the URL to not look terrible and not have Google yell at us for bad SEO practices and make sure that when you look at it, your eyes don't bleed. Also you're going to run into a character limit eventually. So we want a URL that looks a little bit like that. And if you squint, you can tell that that's actually the same amount of information that was in the previous URL. It's just a hell of a lot shorter. So how would we actually get to this URL from that state object? We have the state object here and it looks like this and we can look at it and we can go, hmm, I can kind of come up with our own little algorithm to do this. We've got this manual deserialization in our head that we immediately start to do when we see that URL. And what we really want to do is take it and build an object in memory. So we say, hmm, everything that exists as a URL gets this name application item first. We'll just make that up. Seems like a fair rule. Then we'll take the first bit of the path and push that onto the array. And then we'll take the last bit of the path and, oh. Look at that. That last bit of the path has an additional piece of information that isn't in the URL. How do you know what to do with it? Turns out in order to actually do this, we need to find some way to save information outside of the deserialization algorithm. Since we're not storing it in the URL, we need to store it somewhere inside of the app's code. So not done yet. Turns out we need data, user interface, custom serialization, and custom deserialization. It wasn't good enough to just actually have this really simplistic algorithm. We needed to dive in a little bit deeper and jump into the next little thing. So let's talk a little bit about serialization. And this will be the last time we talk about serialization because it turns out serialization is pretty damn easy. There. That is effectively the entire serialized algorithm inside of Route Recognizer. Once you actually have a backing data structure, it's really easy to go from this in-memory object to something that is just a string. Because you already understand the object while it's sitting in memory. What you really want is to go through and say, hey, I have this lookup of all of these actual names here. I'll grab it. And while I go through the parent, I just take it and shove it into a single path and go through it. So I'm starting at the leafmost node of the URL in-memory object because we're storing it as a tree. Because if you look at it, it's like a path structure, seems reasonable. So we'll start at the leaf and go, OK, here's this one. What's its parent? OK, what's its parent? What's its parent? What's its parent? Oh, we're done. And so what we do is we just grab the output at each one of these tree nodes. And so it turns out doing serialization is very easy. So we're going to skip past that because this is the much more fun and exciting and complicated portion of the talk. So if you have a drink, now take a few sips. It'll make this go down a lot easier, I promise. So the first thing that you think about when you're doing serialization and deserialization in particular is that looks a hell of a lot like a regular expression. All of these routes, we could build regular expressions for every single one of them. You've got this giant router map inside of your router.js. And you can take it and say, hmm, here's a regular expression that would match this one. And because I know which capturing segment is which, I can actually say, OK, for the user route, I'm going to have this regular expression. And the zero matching segment is going to match to user ID. The one matching segment is going to be matching the account ID, whatever it may match to inside of that regular expression. And you can just use that as a very fancy destructuring for like ES6 destructuring. But the problem with that is you've now got 500 routes in your application. And so you have this long list of regular expressions. And it turns out to be able to tell whether or not you need to transition into a particular route, you can't just test until you find one. And I'll explain why later. But it turns out you actually have to run through and do 500 tests of regular expressions. And also regular expressions inside of JavaScript, when they are parsed, they get semi-compiled, which means that the first boot time is a little bit slower. So now you've got 500 regular expressions that are kind of verbose. And then you have to compile them. And then you have to run through and process all of them immediately on application boot. And you end up having a less than happy experience with regular expressions. But the process for building them is still valuable. And we'll come back to why. So let's talk about what this regular expression might look like. We've got this value for a static segment. So if you're defining your routes and you just have a string in there slash account, and we can just throw that in as the regular expression with actually account appearing inside of that regular expression. Turns out dynamic segments, what it means to be a dynamic segment is any string of text that does not include a slash. That's what a dynamic segment means inside of your router. So anything that does not include a slash, which turns out to be to this matching segment right here. And it must be at least one character long. For globbing segments, turns out what a globbing segment is is nothing more than a regular expression that matches any character and that repeats and is greedy. So it's a greedy regular expression for a glob segment. So these are all of the tools that you can use to do a kind of simplistic version of a route recognizer. You can do a quick iteration over a whole bunch of regular expressions. But it turns out that's really inefficient in space, in time, and it's just not what we want to do. So I misspelled tree there. Actually, I didn't. So sometimes I'm not entirely sure that this comes from this. But my mental model is that this is a retrieval tree, and retrieval is spelled with a T-R-I-E in the center of it. So a radix tree or a prefix tree is your way of indexing into an object. Think of a phone book. You don't go from the very, very beginning page of a phone book and scan forward until you can find Hackman. You don't scan all the way to H, like one at a time. In fact, you might do something that looks a little bit like a binary search. You'd start, you open it up, and that says, mm, m. Let's go back to the left. But you can actually do better. You don't need to do a binary search because you actually know that his last name starts with an H. And so you can actually jump directly to an H. Then you know that the second letter is an A, and you can directly jump to it. Turns out that's what a radix or a prefix tree is really good at. You want to immediately do a lookup on a particular item, and you happen to know how to spell. And so instead of mapping it to the object inside of the tree where you have to do this binary search to get through it, you can actually map nodes to the edges. And so the edge between the root and the first one is an H. And so you can just traverse that edge, and then you end up at the H node. Then you traverse another node, or another edge, named A. And that becomes the next node. And you go H, A, C, K, M, A. And you get all the way through the name. And at the very end, the node that you match is actually the person that you were looking for. And that is what your brain is doing when you're flipping through a phone book, but you just didn't realize it. You are traversing a radix tree. So let's look at our route structure one more time. In this case, we have a route that is users with user ID, new, and edit. And we have a posts route, which has a post ID, new, and edit. Then we've got a 404 route down at the very bottom. Now, if we were to do this character by character, we'd probably end up doing something really dumb. But let's, surprise, there's lots of code on the next slide. So we're looking at this, and we say, OK, what if for doing a radix or prefix tree, instead of doing it character by character, we did it segment by segment? Because we know in a URL and in a actual file system path that you're going to have a consistent prefix for whatever that current folder may be. And so that mental model gives us an opportunity to actually make an optimization to a radix tree, whereas not on a character by character level. So we could actually do something like this. So let's do a quick look at it. We've got a URL, which was slash users slash new, but you can't really tell that, except you say URL.split. And so we say, OK, we've got slash users slash new. And what we want to do is go directly to that particular node. So what we can do is we can iterate through the segments and grab the node, and then we can grab the actual property, which happens to be at that segment name. And so the magic line is line 11. So we go node.children at segments at i. And so that segment started out with users, and then it moved to new. And then we grabbed the child for each one of them that happened to match that particular segment, which means that we end up at the very end with this node ID of three pretty magically, because we didn't actually have to iterate over every single node. We actually only hit two nodes, and we needed to hit both of them to actually reach to the final state. So it turns out our traversal algorithm is O, N, where N is the number of segments. That's pretty damn quick. In fact, there's very little you can do to get better than that. And I say that as a person who read a whole bunch of papers written in the 50s and 60s to talk about this. Turns out computer science hasn't changed much in the past 50 years. Then again, I've got this giant design of a radix tree, but it snuck in a little bit of constraints, and they happen so subtly you may not have even noticed. How do you store routes that have multiple segments? So you said this.route foo, and then you set path to be, hey, comma, Nathan, slash, I, comma, slash, whatever it may be. And you've got this giant long string of a path. And you say, hm, how do I handle these multiple segments in my radix prefix tree? Well, turns out that's a little bit more difficult. Also, we have to match routes at slash boundaries only now. Turns out there's nothing stopping you inside of the existing route recognizer other than a couple of API conventions that would prevent you from having a route match on an arbitrary character. You could say slash me and slash mess and slash messaging are three separate routes with no additional slash in between them if you dive in underneath the hood inside of route recognizer to actually make that change. But it's a reasonable constraint. We don't really care to match on any of those non slash boundaries. That breaks the mental model of what the web actually does. And it's also unclear how to support the behavior of a route that doesn't include a segment. How many of y'all have an index route in your application? That should be everybody. Turns out that route does not actually have a segment. It just has a actual magic something. How the hell do we even match that? It's like an empty string, except there could be even multiple empty strings because you can do a path slash inside of a path slash inside of a path slash. And how do you do that? How would you label an edge as undefined? Or do you label it as an empty string? Or is there actually such thing as a segment that is an empty string? It's kind of unclear what we should do here. And these glob segments, they're actually the real jerk here. All of these first three turns out solvable using a radix approach. I know because I built it. I built this three times. So the glob segments are a real jerk because they don't even stop at slashes. So you now have gone through this. Thank you, Casey. Somebody has to laugh at my bad jokes. So you go through all of this process and you split on the slash and you go, OK, great. I can stop consuming, but glob segments are greedy, which means that you can't just stop there. You have to keep going. You must go farther and deeper into computer science. So this is a non-deterministic finite automata, which is a state machine or a, I'll probably keep saying NFA over and over and over and over. And I'm not talking about the agency that reads your email. So I told you bad jokes. So we're going to talk about NFAs. And a non-deterministic finite automata is actually the underlying data structure slash algorithm that the existing route recognizer is built on top of. It supports matching at arbitrary characters. And in fact, if you use a reasonable regular expression engine, such as the one inside of grep, it uses a non-deterministic finite automata to implement regular expressions. So actually, if you look at the first naive approach that we were thinking about doing, it turns out to be incredibly informative as to what the final result should be. Because we're like, OK, we're going to use this tool, which is built on this underlying data structure and algorithm called NFA. So this NFA allows us to actually deal with all of the problems. What is an NFA? So we have a route called slash users slash new. Let's say we exploded that route into a linked list. And we have each one of these characters split into its own separate state. It's a state for the slash, a state for the you. And we are in this weird situation where, OK, we've got all of these things. And then if we follow it all the way to the end, we know that we've reached slash users slash new. And so we've got a kind of linked list model. Let's add another route to it. Or let's add even more routes, not just one. So we've got a new route called slash us. We've got another route called slash users slash edit. We've got another route called slash posts slash new. And another route called slash posts slash edit. And if you look at it pretty closely, we can say, hm, we've got all of these routes. And we've created a whole bunch of nodes to actually represent them. Unfortunately, each one of these nodes is expensive and requires a lot of work inside of JavaScript in order to create. And so creating a whole bunch of objects is not free. And if you have a 500 route application with significant nesting and long URLs, and also, by the way, your performance degrades the longer your URLs are. So I vote that everybody use single character URLs from this point forward. So that's kind of a mess. So it does, however, give us the ability to do things like this. We've drawn all of these, all of the previous ones. We've got an arrow going only one direction. So we've got this consistent flow from left to right. And that's a state machine that happens to be very simple to model. That's known as a directed acyclic graph, a DAG. But we want to actually support dynamic segments and glob segments. So what if instead of having individual characters stored inside of the state machine, we said, hey, we're gonna match this thing. And you can either go back to where you were previously or you can proceed forward. And when you go back, as long as you actually are matching this thing, and you can look at it and see that's a regular expression, it's defining a character class, as long as we can actually go back and go through it multiple times, we can say, hmm, we now know that this route that is defined on line nine is actually a glob segment. Glob segment, because it takes any character until it can't anymore. If you think about it, that's what it's doing. We go from slash, we grab a character, and then we just keep going over and over and over until there's nothing left in that URL. And if we look over there for the users, the user ID is actually just a not slash. So we can keep going until we actually get back and we say, hmm, there's not a slash here. Oh, we hit a slash, that means everything that we just saw is the user ID. And so those are dynamic and glob segments implemented inside of an NFA. And this is actually exactly how the existing route recognizer is set up and functions underneath the hood. Now we could take that and we could compress it because like I was saying, a lot of objects is really expensive. So we can compress it into this kind of model and I've added parentheses around things. Those happen to correlate with places where you could have stopped. So we had a route called us, we had a route called users, we had a route called posts. And so these are known as accepting states. These are places where if you were traversing a NFA, you could say, okay, I'm done, I matched something, I'm done, I win, what do I win? Well, turns out you win a collision. If I stopped on that S that is at the end of users and posts, how do I know which one it is? I'd either have to have saved off my entire path of traversal, which is expensive and slow, because as I'm doing my traversal, I keep saving things off or I could actually not compress that very well. And it turns out that we have a more significant problem with compressing an NFA because we can't say that this is an accepting state without creating a actual separate route node for every single route. So it seems like, yeah, we could say, hey, slash posts and move that S to the same one that is for users, but we can't. And so we don't end up being able to compress this very well. Now, when you're traversing an NFA, the trick is we want to be able to positively identify every single node that we're going to visit. Turns out we can't just go from top to bottom. Traversing an NFA uses something known as a transition function. So it starts with an original state, we're gonna use the slash over here on line one. And that state actually is just the root state. Then what we're gonna do is we're going to follow every single edge and see if our input, which is the URL, matches that particular state. And if it does, we keep that state and add it to the next set. If it doesn't, we drop that state. And so we over and over say, here's the next set of possible things you could be. Hey, oh, okay, we have a slash users, okay, great. Slash U, yes, great. S, yes, great. E, yes, great. R, yes, we've got that too. S, yes, we're there, we've hit an accepting state, we've hit the end of the input and we trigger a match. And we know for sure that it was the slash users route because there is one node for that route. It's a little more complicated if you have multiple things because you could theoretically end up matching two things at the same time. So at this point, you're starting to put on your glasses every time I get to one of these title slides because now we need rules for resolving ambiguous URLs because we could have a URL that was slash users and another one that was just a glob node down at the very bottom, which would accept all of your 404 stuff. That's not very friendly. So we need to deal with this problem. And so if you've ever written your routes to look like this, this is actually very similar to the router map inside of router.js, but this is the DSL inside of route recognizer. The reason it looks so familiar to you is Ember effectively completely delegates to this nomenclature. So we have this route, which is messaging. We also have one called me and we have one called mess and we have one called messaging up at the top on line four and then we have this other thing where it has mess and then a param. That seems like a pretty significant troll and it's almost like Nathan was trying to be a jerk and come up with the most impossible thing to actually have as a route. Turns out, yeah, Nathan guy's a really big jerk. So on line 10, anybody care to guess which one we're gonna deserialize into? Yeah, I don't know either. It turns out for this case, it is actually going to be four, but that's a really obnoxious thing. How would anyone possibly know when every single one of them seems like it could be a reasonable option? So we need to actually come up with rules. We need to come up with constraints and these constraints are the way that we understand what you're going to try and map. So in Ruby, if we were looking at this and we said messaging, it would turn out to match four again but if I moved line seven to line two, it would match the full messaging item. Why? Because Rails actually does a top to bottom list and just matches them in order, which is fine, but it is really kind of confusing sometimes because if I moved line five up to line four, switch to line four and five, it would match line five instead. So that's really confusing. It's a refactoring hazard. There are all sorts of problems with having just this random in order traversal. We really want to have some sort of specificity. So let's come up with some rules. Somebody wrote this thing called CSS and it has specificity and we should probably kind of steal ideas from that. So what we're going to do is we're going to say, here's a route segment. This slide is totally wrong and this is what I get for doing my presentation just minutes before. So we've got a few constraints that we are going to add into the system. The first constraint is that we must have all of our routes begin and end with the boundary character or the end of the actual route. So in this case, we don't end up with this slash messaging problem because we're guaranteed to have a boundary of a slash. We match based upon specificity as opposed to definition order and we'll talk a little bit more about specificity in a second and turns out the deserializer must affirmatively eliminate every single node in the NFA and we have to say, okay, we know for sure that this route is not going to match. So turns out we've got more that we still have to cover. Putting on glasses in the back of the room over there. We now need to support arbitrarily complex path segments because what happens if we have a route that looks like this? We've now got a collision but they happen to have the same sort of rules. We have a static dynamic and a glob and then we've got another static dynamic and a glob and both of them are going to match and damn it, I don't even know what to do. We have more constraints and so in this case, non-epsilon segment counts trump all other things. I didn't actually mention epsilon segments. Turns out an epsilon segment is an NFA edge where you don't have to consume a character. So you could just stick around and hang out in that particular state forever. That's how we deal with your index routes. That's a trick that NFAs have that's really cool. But if multiple routes match, we're gonna take the one who has the most segments first, then the next tiebreaker is segment waiting and we'll assign, let's say three points to every single static segment, two points to every single dynamic segment and one point to every single glob segment and zero points for epsilon segments. So we just have all of these scores but now we actually have to care about things like order. So instead of saying who remembers that you can actually set 256 classes on an object and have it override an ID. So if you ever get into a really bad situation in your CSS and you need to override an ID, add 256 classes and it will work. And that's because of a integer overflow. And so what happens inside of a browser is they reserve eight bits for each specificity region and so they have the elements, then the classes and then the IDs. Well, we're gonna use that same general strategy except we're going to not have overflow problems. And so we build everything up as a string from left to right. And so we say, okay, the base one in this is a static segment, so that's a three and then we've got a two for a count ID and then one for config string, so that's a 321. And if we look at online seven, that's also a 321. So it still matches. We need to go one step further. So the number of handlers is actually the next tiebreaker. So if we look at this, we have one handler called details on line seven, but if we look at line four, we're now nested three layers deep. And so we actually have three handlers involved on line four. We have company, company account and company account config string. So all three of those are gonna be invoked and so we're gonna use that as the third tiebreaker. And if all else fails, we're going to take the first definition that you used. So we eventually fall back to Ruby, but we, or excuse me, Rails, but we add a few features along the way. Ah, shit. All right, so still more rules. We're adding the constraint that it must be fast. Turns out if we're building this router map client side, it's slow. You have to generate a parser. At runtime, in the user's browser, how many people love their little phone and how fast it is compared to their laptop? It's not quite the same speed. And in fact, route generation on less powerful devices can take up to a entire second just to build the route map. And that happens on application boot because, yeah. Finishing that sentence is less polite. So it must be fast. So what can we do to stop generating that? Well, we could actually serialize it. So we can take this gigantic structure that we've built up. I, underneath the hood, we have a backing data object that we can use to parse all of this. And we can say, okay, here's this application. And it has a child node of whatever particular thing it may be. And then this has a parent of zero. And then we have another one that says parent of six. And so you can look at it and you can actually reconstruct a tree from this serialized information. And it turns out that's a hell of a lot cheaper and a hell of a lot faster. And so underneath the hood, inside of my most recent project, which is Rewriting Route Recognizer, we now use a combination of a radix tree and an NFA. So we use a backing structure that is a radix tree, but we traverse it as if it were an NFA. We use the transition function to identify whether or not there's a particular state. And the real magic here is that glob segments are actually just a circular reference back to themselves that happen to consume slashes. So this sounds really complicated and not really annoying to use. And I would really, really, really hate to disappoint Tom Dale. In fact, I'm pretty sure that that would be terrible. So if you want to try out the new, what I hope will be the new Route Recognizer inside of Ember, I'm so totally getting myself in trouble right now. All tests pass. It works inside of LinkedIn's application as of today. So if it works for us, it works for you, hopefully fingers crossed. And we have a new version of Route Recognizer. It automatically does serialization at build time for you. It replaces your router.map definitions inside of your router.js and then ships all of that across the wire. Magic. So it also is this neat little concept of here's an add-on that is providing some core Ember functionality. That's a really interesting and clever idea and we should probably consider doing that, but that's a talk for a future time. So my name is Nathan Hammond. I work at LinkedIn. We do crazy, ridiculous things like rewrite Route Recognizer for the hell of it and for a 500 millisecond performance win on first boot. And so I'm very grateful for them giving me the time to work on this and thank you all so much.