 So, since I haven't really discussed this at all, I would like to take the opportunity and do so. .NET 5 is coming out soon. Expectations probably around November may get delayed, but Microsoft tends to be pretty hardcore about deadlines, and it looks like they're making progress that would fit with that release schedule. A big part of that is alleged optimizations, and I would like to check those out, especially since some of those major points of optimization have to do with areas that I am currently in general outperforming Microsoft. I want to know how to still stack up, but I have changes that I've been making to the entirety of Stringer, and many of those are optimizations. Things that require API changes, so I have to do a major version bump, but I want to ideally get the entire release out before or around the same time as .NET 5. Reason being, of course, that if we can get the releases out around the same time, then there's not waiting on one or the other with regards to running the benchmarks and seeing how everything still stacks up. So, these optimizations, these changes, what are they? Well, one which is really more of an architectural change than anything is that, and I had mentioned this in a previous video, changing the entire system so that types, specialized types for working with specific things are no longer going to be dependent on everything within Stringer, but rather the other way around. That they are going to be dependent on nothing, purely implement the type itself, and then rely on the other libraries in Stringer like core, like patterns, and so on, to actually provide the APIs. Yeah, that's, what are you doing? For a cat, he digs an awful lot. It's weird. Oh, he's taking a shit right in front of me. That's lovely. So, that's a pretty big architectural change because it, you know, in placing libraries in the entire different part of the dependency hierarchy. The reasons for that are definitely justified in my opinion. You now have a library that really just contains the type itself, and you have all the equivalent methods being put in the exact same spot, which makes it much easier to make sure that you're still achieving feature parity. That's important. And it's honestly not a bad thing if all of these are implemented through extension methods. You don't need polymorphism as far as these types of things go, and in some instances actually want to avoid that ever being a possibility. So even if they work classes, which would be capable of it, they'd still be sealed, which would make them not capable of it. So types are now primitive libraries with nothing other than the type in it. Or as having a number of things extracted out of it. This helps with discoverability, helps with updates, number of other things. Encoding has already been extracted out, and as part of that I've gone through and really overhauled that whole interface. Some of those are definitely breaking changes, but as of currently, it doesn't return string for anything. It doesn't return arrays for anything. It is entirely read-only span of car, rune, or byte, depending on what is appropriate. That's important. It does not necessarily allocate, however. Depending on exactly what is passed into it, it may allocate, it may reuse what is already there. No, no, no, no, I'm thinking of stuff in core. The encoder has to always allocate, but it exposes everything through a consistent API using read-only spans. That is a breaking change, obviously enough, but hilariously is such a barely breaking change that all you should really have to do is, if you were using the specific output type, just change it the way you work with read-only span and the way you work with arrays are deliberately almost identical, so there really shouldn't be that much changes that you actually need to do to the code. If you were using VAR, if you're one of those guys, there should be absolutely nothing that actually gets changed, which is convenient, but it is, you know, the actual code changes, so it would obviously enough need recompilation. It's not something that you could just do a library switch, re-resolve everything. The linker can't figure that stuff out. It's actually entire recompilation. So, okay, as part of that as well, something I'd noticed when I was going through an auditing core was that there are really, there were a lot of instances of overrides, overloads that I wasn't providing that I easily could have, and the way to do this that I had worked out is to take each general form of it and put it in a region. For the encoders, it isn't really necessary, but then go through, implement the actual method, the actual function in the signature that takes a read-only span of the type, typically car for obvious reasons, but for the encoders that may be a read-only span of byte or a read-only span of rune, okay, then using that provide the overloads for everything else, for string, for array, for span of that type if necessary, for extension methods it is because the implicit conversion can't happen, for more traditional function style calls it's not necessary because the implicit conversion will happen, but then also the pointer of that type, assuming it is a primitive and byte and car obviously are and pointers are also allowed for unmanaged types, which is any struct that is composed entirely of other unmanaged types or primitives and so rune since it is internally just an integer value is an unmanaged type, so you can take a pointer of rune, which means these overloads also exist for pointers of rune, generally shouldn't be working with pointers, but there are situations where it has come up for me and would have actually simplified some things or enabled some ultra-high performance thing using some trickery, but trickery that would be completely internalized in the library that public interface would not expose pointers at all, but you know that's it's supported now and only through one implementation, each overload only has one implementation, so that's a pretty big deal. There were a few instances largely due to meaning to remove some allocations like duplicate allocations that I was doing that could be avoided, this is part of the reason for the read-only span of car or read-only span return types. I've eliminated quite a few allocations, which then reduces garbage collection pressure, which in turn speeds up the entire system, and a few minor optimizations to just the way the things worked in general to where the encodings are actually encoders and decoders are actually quite good as far as performance goes. There's one, I think it's the UTF-8 encoder that needs some work, but the other three, the UTF-16 encoder, UTF-16 decoder, and UTF-8 decoder, I think those were the ones, but three of those are considerably faster than the ones from Microsoft, so cool. And allocate considerably less, like a third on average, so that's awesome. The less memory you use, the better, especially since processor speeds are increasing rapidly, but memory speeds are kind of snail-pace improvements, so the less memory you use, the better by a huge margin. Usually I don't want to say programmers, more tech-savvy people, are complaining about programs using up too much memory, not programs consuming the entirety of the CPU resources, so keeping memory pressures down is a big deal. So core, largely, encountering a lot of the same changes. There are exceptions, but overwhelmingly, APIs that were returning a string are now returning a read-only span of care, and this helps avoid allocations, but it also takes inspiration from some of the optimizations that were done in the patterns engine, which I actually have to tweak. There's some things that I've worked out that should actually improve that even more, but the general idea that the patterns engine takes is that the entire source text is loaded into memory once, and from that, everything is sliced using the span. That makes sense. If it's already memory, why would you duplicate the parts of it that you're working with? Why would you not just reference those parts? That's where the optimizations for core come into play, that instead of duplicating the arrays or strings or whatever every single time, now to slice all of that. It's not like that's the only instance where this is happening, but this is definitely the big pattern is changing the string returns to read-only span of care returns. Benchmarks for performance improvement on that aren't really going to be that valid because they are fundamentally microbenchmarks, but there are some things that I've written that are complicated enough in what they are doing with those that I can benchmark the entire program, see the performance improvements, and see in general how much that change has improved things, but it's obvious it will just be a flat-out improvement because you are literally not allocating new memory. That entire operation doesn't need to exist anymore. The entire thing of finding a spot on the heap and putting the thing there, of setting memory aside and putting the thing there. None of that has to happen because it's just a simple thing that exists within the function stack frame that references the part of where the string in the heap is. That allocation, pointer and the length, that's all the span is comparatively extremely fast. That's a big deal. Now to go along with that, like I had said, had worked out the system using regions, some other stuff to make it easy to recognize, easy to lay the stuff out, going through and providing full overloads for all of the functions that we're in core. It's a little tedious, but I can typically get two to four of them done in a single day because there's a lot of overloads now, this time around. One thing I had noticed was that I typically left out like carer arrays, which at no point in any of my libraries are those returned instead of a string, but there are libraries out there that do that for various reasons, like the fact that they are mutable, rather strings are immutable. So if you need what is basically like a mutable string, carer arrays are useful for that. So I get why they are used. So that is definitely an instance where I should have had that overload, and I do now. But it's not like that requires any duplication of work, because again, everything, literally everything is using the read only span of car implementation. Because all of the types, whether a string, a car array, pointer to a car, as long as you have the length, which is a required parameter for everything that I'm doing, because that's safe use of pointers. I'm not doing the null terminated shit. Don't don't do that. But they are, they're all able to use the read only span of car implementation. So okay, other optimizations is so many cases like say, the clean operation, which trims the string, but then also takes whatever you pass into it, or default just white space characters in general, and takes consecutive instances of those and reduces it down to a single instance. So the, you know, the excessively repetitive Leroy Jenkins would get reduced down to just Leroy Jenkins with no repeating characters. Makes sense. That obviously requires the creation of a new string. I think I do it as a new character array internally. But you get the point. Yeah, I have to do it as a new character array. But yeah, so it's a new, new character array internally, you don't ever wind up seeing it, you just get a read only span back. But that's because I can't just slice off part of the existing text. But I'm going pretty hardcore with the whole span thing was slicing off things. So trim is an example where that is possible, whether it's the full trim, the trim start or the trim end, it's possible in every single instance to just slice off of the part that you need. Okay, so whether you're passing into an entirely new string, or an existing slice, an existing read only span of car from something else, you can just create another slice from that. That's a hell of an optimization. And hilariously, it actually simplified the code. So that's, that's a big deal. Another case where this isn't as simple as just returning a read only span of care. But still, very clearly inspired by that is chop chop is a little bit of a weird one, because it's not super obvious why it's useful at all why I wrote it like it's just there because hey, it's a thing you figured out or is there some use case for this? There is a use case. Chop is useful when you have a large piece of text that needs to be recent, say over the network in using a protocol that only supports fixed sized packets. Now you see it, you can chop to the max packet payload size, send that, you know, wrap that up in each packet and send it over the network. Now you get it. There are other instances for chopping actually, especially when you get into like text processing using SIMD. Chop allows you to break the thing up into the discrete chunks that will all fit within a single vector. And you can iterate over that using a vector for each one of them. So it's actually really useful for that as well. But obviously, in that kind of situation, especially, you're quite concerned with performance. Because if you're using SIMD, you are obviously concerned with performance. Being a little slower for breaking things up for fixed sized network packets isn't that big of a deal. Your biggest time constraint is obviously sending it over the network itself. So the chopping being a little slower, whatever, but SIMD, well, that matters. So what I devised was a special type that's kind of like a specialized read only span, sort of, but it had its own iterator and other stuff. But the chop string. So when you call chop on something, it returns a chop string, which references the string that's being chopped, and some other information, the length that everything gets chopped into, and it precalculates the amount of chunks that will be present. But that's it. So when you call chop, if you don't use it for anything, that's barely any work being done. You're copying a few values over, a pointer and a integer, and calculating the chunk size, which chunk count rather. But that's actually a pretty efficient operation. There's a little bit of math, but it's all fast, relatively fast math. So okay, that's okay. So it actually does its work through two different things. There's the iterator, which uses the various information or the indexer, which uses the various information in it to grab the chunk that the index specifies. So just like indexing anything, zero would be the first chunk, one would be the second chunk, two would be the third chunk, and so on, up to the maximum amount of chunks. That's why the chunk count needed to be recalculated. I mean, you could calculate that on a fly, but considering how many times you would need to calculate it, it's obviously more efficient to eagerly evaluate that. So the other one, the other thing that it does, as it provides an iterator, the enumerator, that provides the, you know, for each behavior. And the iterator itself returns each chunk, and each chunk is just the read only span of car. So whether you go through the indexer or the iterator, you still get read only span of car, which just references the individual parts. Now, chop before did the incredibly expensive thing of slicing these out, creating a new string for each chunk, and then putting all of those in an array, which is also an allocation, iteration had slowed down slightly. And that's because there's still some optimizations that I can do. However, however, creating the chopped string in the first place had sped up massively. Obviously, there's far less work that actually needs to be done. But indexing into any part of it has also sped up massively. Okay, optimize the iterator a little bit. And boom, you've got the same performance. And I know I can hit the same performance on that. I wrote a quick and dirty iterator for it. So that kind of shows off the example of some of the special case optimizations I can do special return types that provide these kinds of things. They're not hard to write actually, the special return type. Chop string took me 1520 minutes. That's not bad at all. And largely, it was just dinking around with different options, benchmarking a bunch of things and benchmarks, of course, take a while to run because I run very large amounts of benchmarks, you want to see a whole performance curve, not just a individual thing and make assumptions based on that when you could have optimized a small range, but then crippled everything else, and you don't want to do that. So 1520 minutes includes data gathering and other stuff is quick. As for patterns, one thing I definitely want to do, I didn't mention this a few times, but I still got to work out the details, I'm still not entirely sure how to do it is adapting it to the streams API. The streams will have a four point over lease. That is a hard requirement that I'm doing, I will delay my work if it means getting the streams API out. I'm not worried about a pipe and socket stream in the four point over lease. I could come a little later, you know, like a 4.1 or something, but file stream has to be there. And I mean, obviously, I could just adapt it to the well, naively, I could adapt it to the standard.net streams. Can't actually do that because they make some assumptions that don't make some assumptions that are problematic. And relying on text reader and text, well, you don't see text writer with the patterns engine, but relying on either of them becomes highly problematic when you need to backtrack. And of course, the patterns engine backtracks. I could rewrite the thing as a non backtracking thing. But there are actually advantages to backtracking. And it's not just simplifying code. It's also very important when doing like search kind of stuff. Well, I shouldn't say that. I should not say that search can easily be done without backtracking. And you shouldn't backtrack when doing search. But for the sake, simplicity and numerous, there are there are some other reasons some features that it implements that have to use backtracking. I don't want to have to rewrite that and deprecate certain things and complicate downstream code that utilizes it because the whole source state, you know, being able to save that state and return back to it. I mean, it doesn't backtracking internally, but having the downstream program or have a very easy mechanism for for implementing that backtracking is certainly useful. And hey, if they never utilizes it, then it's just the simple backtracking that's done internally, which is actually super fast because of how it operates. But the stream has to be backtrackable. And there's problems with the dot net standard stream. Because of multiple buffers in different locations and other design decisions that are just bad. And I as much as I say, like, different tools specialize in different things. It's just bad. They messed up some design stuff. But anyways, there's some changes I want to make two patterns. Some things that are definitely going to get changed. The result type is going to be reduced down to not being a result type. I'm going to use independent parsers, independent consumes, like a consume and try consume, doing the more classic C sharp style thing, where consume will throw an exception. Try consume will not throw an exception. Makes sense. These behaviors can still be unified using the kind of thing that result was doing. So in but instead, they'd essentially through like a ref parameter, continuously pass around an error. And try consume would obviously if there's an error set at all returns false. Whereas the consume would if there's an error set at all, throw the corresponding exception. That should make a lot of sense. I'm definitely keeping the source type. I may know I have to. Well, I don't know, I'm going to play around with some things it might be possible to do what almost functional style API for that, allowing like chain calls or something. I don't know. Like I said, I got to play around with that. But it may or may not stay, but I'm definitely leaving some kind of non stream API for it. The reason being, as I had explained with my inspiration for optimizing core further, which parts of core are used in patterns. So optimizing those will optimize patterns itself, which is awesome. But with the source type, even if I change how that works a little bit, the idea behind it is to have the entire source text loaded into memory, and then slicing off parts of that because that's a massive optimization. Okay. Obviously, I would not want to get rid of that. So stream parsing has to instead operate differently as its own type of optimization that are not suitable for each other. See that the reason for wanting to adapt a parsing engine to streams is that in instances where the file is either too massive for memory, or just would be too expensive performance wise to load into memory. You want to be able to read just individual parts of it at a time. You don't see it too much with more personal computing style situations. But in business, you can often see XML or JSON files or other shit that is literally gigabytes large. I find it insane. I'm pretty sure there are some bad architectural reasons for that, like you guys fucked up some decisions. But regardless, that is how things are. That's the reality of things. You want to be able to provide parsers that can efficiently work on those. And loading the entire thing into memory through the source type really isn't an option. You need to be able to load that as a file, streaming file instead. Now because of the generalization of streams, that does enables a further option that is useful to me, network parsing. Why would you ever want to do that? Distributed compilers. You can send the entire file over once the entire file is received, begin parsing it. Or, or you can stream the file parsing what you have as you get it. Well, now you have an asynchronous operation. You're receiving the file and parsing it as you get it, rather than part getting it, then parsing it. That speeds things up. And that's important because you don't you want distributed compilers to feel as network transparent as possible. And having obvious network slowdowns is less than ideal. You know, there's actual network issues and you're dropping packets left and right. There's nothing you can do about that. That's going to be a slowdown regardless. But assuming the dangerous assumption, but assuming there are no issues, which is often the case, you don't want it to feel like it's going over the network. So being able to parse it as it's still being streamed to you matters. You know, when you watch something on YouTube, this video, for instance, you don't want to download the entire thing, and then watch it. You want to watch what you have. And Kevin, keep watching as it downloads more and only buffering when necessary. So like I said, there are going to be some architectural changes regarding the result type. The reasons for that while goal direction is cool and goal directed inspired things are cool. Considering I cannot unify it. And there are instances where I need this in multiple libraries. I want to instead return to the type of behavior that most programmers would just expect having worked in dot net. This helps adoption of the library helps people feel like they get it better. Okay, try consume can still be used in many of the situations where the if result equals consume kind of behavior was done. Just some simple replacements one line needed to change the rest of the code can work exactly the same. Cool, that works. But I have to work out of course, when streaming has a stuff get stored, it probably be reasonable to each individual thing that gets parsed be put as a single string. It's a little weird, but that still creates a situation where each part of it is in memory only once. And that actually allows you to discard those chunks that aren't used, which is something that is very important. We get again, the massive files. Now here, a large XML file, a database of stuff that's formatted as XML for whatever reason, this would apply to JSON as well. But XML makes it more obvious because holy hell is it verbose, you're parsing for specific parts of that file. You obviously want to save those in memory, but the other parts of it that are parsed purely for semantics purely for understanding the other stuff that you're reading that you don't need to keep around. Let the garbage collector clean those up. Now by separating those out as as individual strings, you will allow that to happen when strings that represent purely semantic information that can be discarded aren't referenced anymore, because they can be discarded, they don't need to be referenced, garbage collector clean those up, leaving only around the parts that do need to be. That's useful. But that does mean I would probably need two entirely different parsers that I cannot share the implementation. It's a little less than unfortunate, a little less than fortunate, but whatever. But I want to be able to provide stream parsers. That is important. Many parsecs don't provide that, although they they can the way they operate allows that to happen. And regex engines. I've never seen a regex support streams. Other things like string scanning certainly could be adapted to use streams but just don't currently says for other projects. Streams doesn't need to change really. I don't know if I have a beta of the most recent design out in a new get package, but the repo, if you look at the repo that design is largely formalized. It's, it's definitely beta state. Really, at this point, I'm just implementing the last basis that I need. I don't want to get too much into the architecture of that because it is a closed source library. That is one thing certain libraries are closed source this time around. It's sort of making that obvious that I was going to do that a bit because I've been noticing some some of my code winds up in places without proper credit. Consistently in just a few days after I do things. So some of the more impressive things are staying closed source the patterns engine is staying open source regardless. Because I want people to be able to understand how that works because it's very different from anything else out there. Other things like the Streams API are entirely closed source. But there are what I'm calling bases, the absolute bare minimum mechanics for any stream that needs to be implemented to support that stream type. A memory stream base supports the absolute bare minimum required to stream from a location in memory. A file stream base supports the bare minimum required to stream from a file on disk and so on. So I have a memory stream base implemented. That's unsurprising. It's the easiest to implement. It's there's barely anything that really to it. It's just a, you know, a buffer, an array or read only span of bite or whatever. But it's a well span of bite. It has to be readable or has to be writable. Because it, you know, stream writing too. Yeah. The one the other ones that I need to implement for a 4.0 release is the file stream base, which seems to be working on Windows. There's some more testing I need to do, but I don't have a unix implementation for it. And this is where streams gets complicated. See the the Streams APIs for different operating systems are different because the way the streams work on different operating systems are different. Now I could go through everything with a POSIX API implement everything through one common way. And that would work. The libraries for POSIX are named the same thing. So I could just bind to libc universally, use the POSIX Stream API. But there are advantages to using the Windows Stream API in the Win API. And I'm doing that, largely because it buffers, the function utilizes a buffer. And so if your internal buffer is going to be the same size as that, then you don't have to pack your buffer, you can call this function once, get the entire buffer and then just be good. That's less function calls. That's a that's significant for performance. As for POSIX, conveniently, the way streams work on Unix systems is similar enough across all of them that POSIX works just fine for that. So it's not like I need a macOS API and then a, you know, a Linux API and then a free BSD API like POSIX covers all of those. It just doesn't adequately cover Windows. But there's another one, a sort of specialization of Stream itself, the standard streams. Now this is where you really want different APIs. POSIX standard streams are nice and simple. They're just the certain files. Windows, no. For Windows, instead of the file part of WinAPI, you want the console part of WinAPI. Now you can open them as files, actually. And I think for the purpose of Stream, I'm probably going to do that. I have a side project that's relatively unrelated called consoleator. I'm thinking of implementing that using the console WinAPI instead, so that there's no stream as part of it at all. So that console literally isn't using a stream except on Unix systems where it has to. Because it's an optimization, because you can bypass the entire stream thing, just use the console API. Because if you look at how the console is implemented in .NET, it's confusing as all the fuck. But in either the Windows or Unix case, there's a stream, and it wraps up. There's a lot of middle man stuff going on. I'd like to avoid that. But there are all the reasons for that side project existing. It's going through some massive changes as well. Streams has to use operating system specific code to do what it does. And this is going to become even more apparent when delving into pipes and sockets. Pipes being used for inter process communication, which is typically but not necessarily on the same machine. There are pipes that cross network boundaries. So in that instance, you create a pipe to a socket that it's yeah, but the other sockets being used for network transport. Okay. Those are coming later. But those also require operating system specific code. This also means that sockets would probably wind up requiring specialized code for support on things like iOS and Android, because they tend to hide their internal operating system well enough that you probably I am not certain I don't program them very much, probably have to use higher code, the mobile specific API's, which may limit whether or not I can support them or how they are supported. That also means packaging it gets a little interesting because operating system specific, how do you go about packaging that? Luckily, there's a there's a way it's already been derived. It's already been figured out and implemented and everything. But there's a way to do it. And it's luckily as simple as just download the for the downstream as simple as just download the package and new get figures out everything for you. It's awesome. Lastly, literary literary is not changing a whole lot. It doesn't need to. So yeah, can expect some overrides, but that I didn't implement before. But I'll do that through the same mechanism that I've been doing in encodings and core. But that's it really doesn't need to change much. So yeah, that's where I'm at Russian to kind of get those done. But seem to be on track. So until the next video, have a good one guys.