 Hi guys. I want to talk a little bit more about String Air, but also stuff that has nothing... I shouldn't say nothing, much less to do with String Air. So first things first, we're actually going to take a broad step back to just sort of what... what paradigms in programming I favor. Because that'll be highly relevant as this goes on. So I am unsurprising that I'm a fan of procedural programming that I tend to shy away from functional programming. And it's not that I don't think a higher level of abstraction than procedural is a good thing. I actually really do. It's more that, look, functional programming has some great ideas. And there are totally situations in which functional programming is the right way to go, especially for math. However, I find functional programming more of like a general abstraction, where you can provide the same functional interface for basically anything. And have it behave the same way. And that level of consistency is actually fantastic, but it lacks context. It lacks specialization because it's so generalized. What I mean by this is the operations, fold, map, reduce, things like that. You pass them functions, which is why they are so wonderfully generic, but also... What it does is entirely dependent on that function you pass. It's a good thing and a bad thing. But what I actually favor is more of a declarative approach. I am very fond of declarative programming. And this is in fact one of the reasons why, while I do recognize a combinator theory as a fantastic way of addressing complexity and reusability, parser combinators, I don't feel like are the right approach, because they dictate how you parse, whereas stringier's patterns is fully declarative. It's what to parse, not how. Which then allows multiple different ways of parsing the same thing to be swapped out without you even knowing about it, because you just told it what it needs to parse. And if it knows, if it can optimize around something, it can. Because you didn't tell it how to. You told it what to do. What to do at the high level. So unsurprisingly, I favor APIs that are very declarative. Now it's not to say that they only need to be that level of high. Oftentimes I even just prefer functions that are very clear about what they do, that they have a specific name. Even if it is, say, a reduce that's happening, just the fact that it has a dedicated name that is clear about what it is. I prefer that. Now, again, there are situations where that level of generosity is nice. So then we tie into collectathon a little bit. Collectathon, originally for those who don't know, because I had suspended the project and never talked about it, was an experiment on how to get as much code sharing as absolutely possible in a containers library. Ideally having good performance as well, but the purpose was as much code sharing as possible. Because then you can supply a lot more data structures far more easily because you can implement them faster. You have more code sharing. That also means, of course, when bugs are fixed and something, there's a good chance it's fixed for multiple different collections at once. And if an operation is optimized, it is optimized for multiple different collections at once, which is fantastic. The original design of collectathon was to separate the ADTs from the data structures, so the abstract data types from the data structures. I've grown to disagree with that approach a little bit, especially as trying to add more data structures. The big one I got stuck on was self-sorting lists. So what I had switched to when I resurrected this was trait-based APIs. Nobody really realized it until much more recently. I've got an article up on Devto if you want to read that specifically, but traits were introduced in C-sharp about 13 years ago. Like actual traits, not the composite thing that people have been doing, but actual traits have existed for 13 years. And so I've been utilizing that after realizing how that would actually be implemented. And it's been hugely useful. You can provide functions that have nothing to do with the associated collection at all, but just deal with traits the collection has and are coded against that trait. And so that allows it to work easily with anything that provides those traits, and you don't have issues of where does this go in the dependency hierarchy. In the inheritance, you just provide it for the traits. So it's been working phenomenally well. Collections, because they are generic by nature, actually make a lot of sense to provide those functional APIs, like fold and reduce and map. Those make sense. Now, in the video where I had mentioned reviving collectathon, I did mention that part of the reason why I was doing it was to provide some text specialized collections. Now, those would have a much more declarative API bolted on top of it as well. But that's because you actually know what you're working with now. So you can. There's two collections I want to talk about. One of which is, so I've done an attempt on this before on trying to optimize string builder because I am convinced having looked through how it is implemented in .NET Core that it is not the most optimal way to develop a string builder. I still largely feel this is the case, but I'm calling my implementation a yarn and trying to keep its API, not including what it draws in from its dependency, but trying to keep its API as close to string as possible so that it feels like working with a string, not like the string builder's weird API. All in all, it won't be that different though. The idea behind the yarn is just to be a resizable string. No fancy stuff going on, just a resizable string. It is useful for cases where you've got a small amount of dynamic text. On the other side of this, you've actually got multiple different data structures that computer scientists have worked out, such as the rope or the gap buffer. Piece tables are another one. I'm going to be going with a rope, although I actually disagree with the conventional way in which it's implemented. Ropes are commonly implemented through a binary tree where the text is at the leaves and the non-leaf nodes hold no data. That should make sense because if you just follow the leaves along, you have each chunk of text and the whole thing together represents the entire text. You now have an obviously dynamic data structure where any additional chunk of text can easily be inserted or added or anything. And that makes sense. The approach is not wrong. It implements what it should. But trees, for those who don't know, are a very generalized data structure, which makes them powerful, but the generosity comes with a cost. See, trees support almost anything. They support unstructured data. Well, I shouldn't say unstructured. It is inherently a structure. I didn't mean to say that. It supports unordered data, unordered data, unsorted data, and sorted data and combinations thereof. So, we know text is ordered. It's not sorted, but it is ordered. Is there a data structure that provides for sorted data, or ordered data only? It's not sorted, cannot, could support sorted, but doesn't implicitly sort. It's not self-sorting, but supports ordered data. The unordered has to be ordered data, but providing the same or relatively close asymptotic times. But because of its specialization, superior execution times to the binary tree. Yes, the skip list. I realized that would make a lot more sense after going through and planning out how to do the rope and how to optimize it. You've got a lot of different properties on the nodes. Like I mentioned, the non-leaf nodes don't actually contain any text. The leaf nodes don't contain children because they are leaves. And so, there's a technique you can implement called tree compression, where you implement typically through polymorphic nodes, although you can do it through variant records. Actually, I should say typically through variant records, although you can do it through polymorphism. I should say the other way around. Dot net world, you would have to go through polymorphism. So you can compress that way and save a lot of memory, because the nodes don't have erroneous fields. But traversing the leaves becomes a little, yeah. Because in order to go to the next leaf, you have to go up to the parent, and if the parent then has the next leaf, you can go out there. If it doesn't, you have to go up to the next parent and then down and keep walking through that way. Which is fine, but we can optimize that. See what you can do. Keep those child links that the nodes have. But instead reuse them for the next and previous nodes. In other words, you have a binary tree whose leaves form a doubly linked list. Now this is approaching a high degree of complexity, but that is optimal. Because now you can directly traverse between each leaf. So you use the binary tree part for indexing into it and for efficiently adding and slicing and all that crap. But you just optimized traversal. Which of course you're going to want to be able to iterate through the characters in a string. That's a very common operation. You want to optimize that. It was then that I realized the folly that this really should be done with a skip list. Because you're numerating often. It is actually very easy to index with a efficiently with a skip list. Because how skip lists work is each node potentially you can compress a skip list as well and it's a good idea to. But each node potentially has jumps to much further along in the list. Now in order to do this you have to have how far that jump is. But that's not hard to do. So you can store that jump and then use that information to much more rapidly index. Now if the index that you're looking for is at is still larger than your current position plus the length of the jump. Then you know instead of going to the next node to go through that jump. The jump can be arbitrary but we can potentially be talking about jumping forward say 40 characters. As a single dereference rather than 40 dereferences. That is obviously faster. But there's another optimization. This whole thing presumes that each node holds a single character. So there's another optimization of the list of the linked list that is called the partially unrolled linked list. Or sometimes just the unrolled linked list because the implication that it's partial is well it's implied. If it's fully enrolled then you just have a vector and a resizing array. You can have the nodes polymorphic on not just the amount of links that they have but also their content. Have a node whose element is a character and a node whose element is a character array. Those implicitly have the length so you can do the same kind of skip trick. To figure out whether you need to index into that element or go to the next node. Both skip lists and partially unrolled lists are substantial optimizations. Furthermore splitting at any of these because you need to insert inside of them or whatever the operation is. It's actually quite efficient because it's just the creation of three new nodes and replacing the existing node with those three. One for the piece before. One for whatever you're inserting. And one for the piece after. Now what I said in the last video where I was mentioning both stringier and collectathon. Was that I didn't think that it was a good idea to go about implementing these and utilizing them in stringier's core API. That originally felt like I should but then backtracked on them. Well I'm backtracking again. So I didn't feel like I should do that because I was keeping the whole API as using read-only-spot-of-car when no allocations were occurring and string when allocations were occurring. But that wasn't entirely true. In fact I was using specialized collections for certain operations. This is what the chopped string and split string were about. So if I'm using specialized collections for certain functions then just how it makes more sense to keep doing that. I certainly could push stringier core v4 out without doing that. But then do I do an entire v5 rollout just to reimplement certain functions to use these collections? That's silly. That's not a good justification for a major version bump. When you've already got some specialized collections and the major version would just be adding more specialized collections? No. No that's dumb. So we need to talk about another paradigm that I'm fond of. I think it's because I recognize the structure in a lot of data and it's far more easier to work with data that is structured using an actual data structure than to deal with that all implicitly. I'm very fond of actually utilizing data structures and manipulations upon them rather than doing it implicitly. So this would wind up meaning that an operation like, say, replacement within a string, as long as you are not replacing a single car for a single car, in which case, no, obviously you're just going to build a new string. But if you're replacing, say, certain substrings with other substrings or a character with other substrings, the operation is actually far more efficient by using a rope as the return type, building a rope that represents those changes. And then if the person, the downstream programmer, really needs a string back out, then they can convert the rope to a string. But otherwise, since I'm trying to keep the API as orthogonal as possible, trying to keep it so that all string operations still exist on yarn and rope, that you can just keep using the rope. At a point where a rope is justifiable, you probably want to keep using a rope because it is more optimal. So I am going to go that route. I am going to make sure collectathon is fleshed out enough to support these data structures that I'm going to utilize within string here. And string here, core, is going to use those as appropriate. Now, it might seem like this is some unnecessary overhead. And there's some weighing that needs to be done. Not all operations that need to create a new string should jump straight to a rope. That's not justified in many cases, actually. A fantastic example of that is Ensure Begins With and Ensure Ends With, which are variations of Ensure Begins and Ensure Ends that append or prepend, or append or prepend if it's not there. This would be like making sure a title is on a name. If it's there already, you can just reuse that, but if it's not, well, you're going to add it. This is sort of a pattern around one of the more common cases of Ensure Begins and Ensure Ends that I've seen. Those creating a whole rope is unjustifiable. Replace is justifiable, but those, no. So I'm going to be using my head with this stuff. Just like how if there's no allocation that needs to happen that's not returning a new string, that kind of thing. If the complexity of a rope is not justified, it is not going to be returning a rope. I'll have to benchmark it, but the Ensure Variants might not even be using a yarn. They might be able to efficiently do their work by just allocating a new string because all that's happening is potentially a single concatenation. And that would obviously be more justified. Plus it would allow returning just a reference to the existing string if it already has the required part. That obviously makes more sense than anything. So it's about finding a balance between all of these, but it inherently does make a lot of the code more simpler on my end, which is fantastic. And like I said, Ensure is that you're getting out the structure that makes the most sense for what you're working with. So, yeah. Just wanted to put that all out there. Have a good one, guys. I actually spoke too soon. There's more justification behind actually developing collect-a-thon to the point where I could use it throughout stringier in that there's been a growing number of cases of various data structures being implemented within stringier. Examples of this are like the glyph tri, which a tri is a retrieval-optimized tree. A multi-way tree specifically. Categories is a unification of three different types of sets, two of which are property sets. The other is a more traditional set, although it is algorithmically defined. But that it also supports building an expression tree of those sets as an efficient way of representing set theory operations like unions and intersections and differences of those sets. Also, patterns were always a collection. It is technically an expression tree, but in a very different way. So, as I had stated with one of the big goals of collect-a-thon being trying to implement as much code sharing as possible, that is another thing I want to look into how much code can be shared to make those implementations easier, simpler, and potentially whether any code could be shared between them. Because that's always fantastic, you know, like I said, when you have shared code, any bugs fixed for one of them is fixed for everything that uses that shared code. Any optimizations is also applied to anything that utilizes that. So, there's more reasons for actually resurrecting collect-a-thon and why I'm using that. Now, inevitably, does this mean some potential delays to the before rollout? Yes, yes, absolutely. It does mean less work in the long run, which is important because I'm a single developer of this entire thing and it's growing into a very large library, so the less work I have, the better. But it is more upfront work, even though it's less work in the long run. So that means delays are possible. Not much, I don't know, but it's worth it. It is absolutely justified and hopefully that's obvious. Have a good one for real.