 Hopefully the mic's not gonna be picking up too much wind, but it's not that windy, so it should be pretty good I'd lumped off saying I was gonna talk about the link and sequences side of What's going on in string here and especially since there's quite a lot of noticed here When I was doing things originally, I didn't even fully recognize that there were actually similarities to link and In some cases this strictly follows the link pattern in other cases not exactly, but it's still super close and Regardless it should be following link conventions so since link is It's not exclusively a dot net thing anymore, but it's it definitely originated in Dot net and that's where it's most widely known and that's where one of the more important parts of it shows up in link Can be thought of as two things The first is a set of extension methods which are used to implement its functionality Now you can call those extension methods Regardless through a fluent like syntax and it's still highly readable and it does what you Want it to this essentially winds up working like function composition, you know chaining the functions through each other Just the syntax is a little bit different in object-oriented languages You tend to go with a more fluent design rather than a pipeline operator for functions But either way the the effect is the same you're taking the output from one into the other and you build up these queries through that and sometimes not Specifically queries sometimes it's just general transformations or whatever, but It It's almost database like that's that's why it tends to be called queries and that's why it focuses on that the other side of link which I can't really implement all the discuss why but it's a pairing between How what is it? I query a bull and the I query provider or something? I haven't ever actually implemented link. I've tried to Wrap my head around a few times, but it's involved and What those allow you to do is actually use the formal link syntax, which is part of the language Obviously, I would not be able to do that with string or really any of the text types because They're not my types. I don't control them I'd have to convince Microsoft to add interfaces to their stuff and then I'd have to put tons of my stuff in their runtime Which would be fine. We've got compatible licenses, but There's no way in hell and I convince Microsoft to do that I shouldn't say that it depends on who exactly addresses the issue at the particular time But it's still probably not gonna happen. I had to backtrack a little on that because there are some guys on the dotnet team Who are adamant that string is a primitive and shouldn't be implementing any interfaces and shouldn't be implementing any? Advanced functions because it should be viewed strictly as a primitive Because I get it. It's technically a UT of 16 code sequence It should not be viewed as a textual type but it is You guys never implemented a non-primitive type to be using instead and tons of Things got implemented for string that make it really not a primitive anymore It's gonna potentially Convince them to add I query able to that but you'd also I mean you don't really have a choice when it comes to the read only Spot of car you can't implement the interface and I don't believe the link Expression behavior is pattern matched. I'm pretty sure that has to go through the interface, but I'm not I'm not certain Don't hold me to that so what I'm doing Obviously enough is just the the extension method side of that the fluent syntax But that's fine, that's that's what I'm doing anyways I'm building up a large collection of extension methods for text processing. So cool now this is going to sound super weird, but because Because I didn't realize that some of these were very link-like in the first place. I missed out on a Useful optimization in that the method names were wrong. I haven't lost my mind. I'm not drunk right now I'm drinking coffee not alcohol Yes proper naming of methods can lead to optimizations Not in the normal case at all In order for this to make any sense you have to have an idea of how method resolution and C sharp visual basic and F sharp works They preferentially select instance methods over extension methods, but then within each of those they preferentially select the most specialized method So if you have a method that is declared for your exact type rather than something lower down in your inheritance chain it's going to preferentially select that if you have a instance method or a Method that works on specific instances rather than on generics. It's going to preferentially select that and Within generics it is going to preferentially select the most specialized generic Make sense so we have this whole order of specialization that it relies on This is where we get adopt amazations method resolution has to have the same name It can't have the it can't figure that out based on different signatures because even if the signatures are the Are based on identical signatures because even if the signatures are identical if the names different It's probably because it's doing a different thing So it has to have the same name Now if you were using the names that mine already had then You're not going to see Any op you're already getting the optimal performance should make sense, but People who are familiar with a link and if you're using net you probably are familiar with using link You're gonna want to use the names that are already there and that means you'd be missing out see Using the the same naming conventions as link has isn't just an optimization thing It's also a discoverability and learnability thing Following that exact same convention since my stuff is essentially doing the same thing Makes it so that there's Far less to learn which is fantastic That's the same reason why glyph is designed in the same way as rune and car Because there's less to wrap your head around it is essentially the same thing all over again That's fantastic So was I using the same names in all these situations? No No There were some where I definitely was contains is certainly an example of that But then I went and did the contains anything and That really should have been Just any See this is where we get into some minor differences, but essentially it's still so similar that the intention would be obvious Any in link accepts a lambda a delegate Is it a delegate? I guess it could accept a delegate, but It's meant for accepting a lambda You provided a function that function is executed on every single element in the in the sequence And as long as At least one of them holds up then the any is satisfied now what was contains any being used for Well, essentially the same thing but removing the lambda which actually is a one hell of an optimization instead of a lambda that needs to be satisfied it was a specific thing to Look for just like with contains But the behavior was a little bit different See with contains any You gave it a set of things and it just had to contain any of them So it's going to go through As long as any of the characters within the sequence Satisfy any in the set Then it holds true You can extend this to all as well For all as in as long as all of the characters within the sequence Satisfy at least one of the characters within the set You've got all And you can extend this to Um unicode categories as well Let me any satisfy that category or as all goes all satisfy that category Should make sense The semantics of this method really aren't all that different Would it be usable exactly in a link situation? Well Sort of If you're doing link stuff otherwise and you happen to at the end of the call Chain before that point you still have an innumerable of character or An innumerable of room would be allowable in that situation I don't know why you would have one of those you can normally just work with the characters directly That's what i'm trying to do is an optimization not force specific encoding Then You'd still be able to see that extension You'd hit dot and start typing and that extension will come up with the others If you happen to already be using the lambda, would you get any optimization out of this? No, the signature is different The method resolution would not know to look for the other thing Potentially you could write an analyzer that could identify those types of situations and recommend that you don't use Use one of the overloads that doesn't utilize the lambda that would Obviously yield the optimization, but Writing analyzers is but It's tricky I don't entirely think roslin has the best api. I think I'll get into that more when I have an example that I can a counter example that I can I can show better designed api for doing what they're doing, but They're not terribly hard to write So you could provide that analyzer now something Else that I've noticed while I was in there And this is a definitely happening thing sometimes You've got a sequence of characters and you want to see if the specific rune is within it and this would typically be because you You know you got a sequence of characters and you know the the go-to rune I usually have is like the g clef because you can't represent that with a single character Do you want to see if there's a g clef anywhere within that sequence? If you were to do that by a contains car it's Not going to work. You can't put that in a character for the in the first place But also even if you were to search for its surrogates, that doesn't mean those surrogates were paired correctly So do you just for those specific instances write your own search that does it based on? You know going through and seeing if the high surrogate is immediately followed by the low surrogate because doing that on sequences especially innumerable which you can't randomly index into is I mean it's not super difficult, but it's a little less obvious that if you had um The ability to index anywhere into it. It's okay I could provide the overloads which do that, but there's a Better way See this kind of thing would need to be done repeatedly The enumerators for runes they work By reading the sequence of characters Now in the two specific examples where the the rune enumerators were already implemented these sequences were a string and a read-only span of car But at no point does it require random indexing the only thing you need to enumerate Uh, the runes in a sequence is for it to be a sequence of characters now This would not open up the possibility for using I enumerable of car as a text type that is stupid the performance implications of that are Horrible you should never do that and i'm going to enforce that by not allowing that as an api choice But the link side of these it makes perfect sense I can provide that enumerator that enumerator would be available for consumption by other people You can provide the get enumerate runes extension method for the sequence But also would be able to utilize that same enumerator in the implementation of these various link like methods So you could do contains rune and it just utilizes the enumerator and see if it finds the rune and you know Boom, you've got it That simplifies my own code. Now if I only had one situation where I ever needed to do this I wouldn't bother with it, but there are multiple situations where this is a valid thing to do And it just makes sense to actually have An enumerator provided that I can reuse and other people would have access to another Example of these link like methods turned out to actually be join Now join is a particularly interesting one because It it's a showcase of why link has Some of the deservedly bad reputation that it has See link is slow as all fuck Sort of the link base is slow as all fuck And the reason for that is because the overwhelming majority of performance optimizations Actually rely on assumptions And making sure those assumptions are valid, you know, invalid assumptions lead to Bugs There's really no other way to put it if you're if you've optimized something, but it doesn't work right That's not a valid optimization. You fucked up. You introduced bugs into your code base Okay, they're based on assumptions So link in general Is General they can't make any assumptions That's why it works on I enumerables, which you can't even assume you can immediately know the count of you can't in Assume that you can index into them That's why the performance is bad Why the performance is bad There are other reasons, of course the whole process of continuously creating new sequences is expensive But actually thinking about it. There's actually a Clever way around that um This is just an aside, but linked containers Have a very clever way around that. Oh my god I have to do some tinkering at some later point Anyways back to what I was talking about uh Remember what I said about method resolution though That the methods are resolved based on the most specialized See link was designed this incredible general way as a matter more of convention than of a complete viable finished thing You optimize link by providing more specialized overloads of the methods that are already in the link now in a non-niche case You would do specializations like and some of these already are in the base class library, but Uh specializations like for count Count on an inumerable. You don't Know that the container or collection already has a Known length it may be a generator which doesn't have an end point at all Oh calling count on that is a really bad idea Oh my god, that's a design problem with link thinking about it. Holy crap Oh But what count has to do in those situations is iterate through the entire thing Wow generators. It's a bad idea Keep a counter that it increments every single iteration And then on completion returns the counter Makes sense, right well for Collections that you know the length of because the length is a field inside the collection and you track it You have a count property That you can get that field through you can specialize By overloading a count for say A collection of the same type And it would be fine in this instance. You don't need to know the type So it'd be fine in this instance to have that method that count also be generic, but the generic parameter being of the collection type Rather than the innumerable You know have access to the count property You will now get the field value Rather than enumerate through the entire thing counting all the iterations You see why that's that's an optimization But also you see how that's literally nothing more than just providing the extension There are libraries that take advantage of this I won't be including any as dependencies in string here because that's not really the point Plus some of them are incompatible or need to introduce their own specific apis to utilize their functionality But it just sort of goes to Contribute to towards what i'm what i'm doing and that there is a valid reason to be doing these kinds of things There are link optimizers Things like hyperlink or fastlink There's even a very specialized one called value link The idea behind all of them though is that they provide these kinds of specializations In the case of value link does the very interesting thing of introducing its own api that you call it through but utilizing value types and convention To provide that link interface It's really only useful in situations where you've benchmarked your link queries and know for a fact that the primary reason for the performance issues has to do with memory pressures You don't want to use that in the majority of cases and I will not be utilizing that technique in optimizing anything. It's not worth it probably but It does go to show that link optimizers are totally a valid thing Stringer will just be having a very highly specialized form of them Not general like hyperlink and fastlink does but strictly for text so that also means that all of the link overloads will not Be provided because some of them simply don't need to be at all count is a fantastic example of that The count optimizers are already provided You don't get anything about specializing them for a specific type But you do get them in other situations Join being the fantastic example now Some of these behaviors some of these optimizations all that crap are done by not strictly following link conventions Join for example Is meant to return another I enumerable but that's Not what I want to do at all I want to join them into a string Now again these these queries are still provided on string. So even after you join it, you'll still be able to do additional queries but You see where that that goes Join isn't just meant for I enumerable is either there are all sorts of optimizations in some cases The most optimal thing to do is return the exact same type Join of a string because remember string itself is an I enumerable of car Or join for a read-only span of car is itself can still conceptually a sequence even though it doesn't implement that interface Join semantics makes sense for them. They are sequences But you wouldn't want to actually do anything You want to just return in the spans case itself and in the strings case I mean you could just return itself, but for orthogonal apis Remember what I said in the last video if it allocates a new thing it makes sense to return string, but otherwise You're going to be taking a slice of that A literally of the entire a disconversion to a span of the entire string There's nothing to join it is technically already joined as far as the semantics within string here goes So don't do anything By providing that api Usage of join in link queries can recognize that And not do the stupid join that it does by default because it can't make assumptions But there are other instances where this You know in between points Join for an array for example Actually, no it used to be a little bit different, but really now it's not It's not Because now you would just return the span Man, that's fantastic. We've got a lot of optimal cases Join for Sequences of strings You can't do that anymore though. So this is actually a useful example If it's an innumerable of string you have to default to the almost Same assumptions that link has in the first place You don't know what the total length of that is so you have to use a string builder or similar api to actually construct the entire thing It should make sense however join for an array of strings You can actually fast calculate the length of that It is iterations over it, but iterations through an array are very fast So on the length create an array of that length Do your copies into that array Boom, you've got it Return that array as a new string You're done That is a hell of a lot faster than using a string builder iterating over the entire thing So you can see where that that goes I it's funny. I I intended the the version four of stringer to be more of a api change than anything, but there wound up being so many fantastic opportunities for optimization and you know, some of the motivations and kind of racing and getting v4 out at or a little before .NET 5 comes out and granted I want that primarily for the performance showdown with the reg X improvements that they're doing but These you know, the patterns engine is actually built on top of core. It's it's not its own Thing separate from it. It actually utilizes some of the stuff within core and will increasingly utilize those Since as part of this audit, I actually noticed I have some duplication that should never have existed so I've been working that out and The patterns engine is probably going to work a little differently internally and wind up utilizing what's in core a bit more so You definitely want it optimal Because I don't want to be shown up by Microsoft I'm already generally outperforming them on that especially as far as memory usage goes, but even performance There are many situations where I was outperforming them Or at least competitive with them. I'd like to keep it that way so I'm not sure if there's going to be a third part to this to see if Actually, no, I know for a fact there will be a third part of this because there are some changes that I need to describe With regard to unicode categorization Because there are some changes that I already made and some changes I may additionally make seriously The .NET runtime didn't even get unicode categorization, right? I wish I was fucking kidding It's not that it categories is wrong, but there are issues with the approach. I'll get into that But there are further improvements. I'd like to do beyond what UAX 44-5.7.1 does So Talk about that so that'll be the third entry in the series. I don't know if there's going to be more but A lot of changes to even just the core that are coming. This is a big audit A lot more going on than the previous audits So until then good one guys