 Happy Halloween everybody! Kind of a gross day, but it's kind of par for the norm where I'm at. It tends to actually snow on Halloween, and I'm surprised it's not that cold this year. I have another video in the works that I plan on recording today on using stringier for doing text validation in a contrived console app, but it's enough to show the basics. And it's just sort of an employee crud if console app hooks up to a database and does some basic crud operations. Obviously it makes more sense to do validation before sending it off to the database because especially if your database is in a remote location, which is becoming increasingly common, but also if it's even if it's at that same location, it's much quicker to do the validation on the client side than to send it off to the database, have the database be like, I can't take that. That's bad for whatever reason and then send it back. That validation can obviously be done with RIGX and other stuff, but I just wanted to show how it would be done through stringier. That being said, I do want to talk a little bit about what I'm working on. Some of you may have noticed that I've stripped a lot of the projects that I have been doing and the reason for that is because I spread myself too thin. Unfortunately, that's something that you really almost have to do when you're an AIDA programmer because there's such a lack of libraries available or those libraries are hard to install or contain way too much shit that you don't want to pull down and also including your ad. I don't want to talk too much about that, but the .NET world, obviously there's a huge amount of packages available through NuGet, which means you really don't have to. There are already excellent math libraries. There are already excellent container libraries. In fact, I had done a little experiment using my container implementations against Microsoft, and there really wasn't any noticeable performance increase. The characteristics were a little bit different, but overall, it wasn't discernible whether one was superior to the other, unlike with AIDA cores where basic common operations were like twice as fast. So that's good, and that allows me to really focus, and instead of doing a bunch of stuff that just kind of has the basics, I can go full in on something and provide a really excellent implementation of that. What I want to do that with is with Stringer and another project that builds on top of Stringer that I'm not really ready to talk about because it's still very much in the works, and I don't have the design formalized yet. Look at the kitty. Hi kitty. I'm going to clean that door, but it's kind of bug season. I'm doing, well, we'll phrase it this way instead. I have some stuff that I want to do to Stringer for a version 2 release. I know that's pretty quick, but the development of it's also been going pretty quick. And it all sort of started by recognizing that the cultural comparisons that Stringer was doing aren't actually being done right. They are in certain instances and then aren't in other instances, and yeah. I was originally piggybacking on the String equality in C-Sharp, where it will allow you to specify the String equality comparison type, which can be the standard ordinal that you'd expect to see, but also culturally sensitive comparisons. And I'm not entirely sure why that wasn't working right. I think it had to do with just how I was breaking down text and how I know in some cases it was because I was looking for a specific length, and not all equivalent words are of the same length. The easiest example is straight from the Microsoft docs where Encyclopedia you can have that spelt with an E, an EA, or an Ash, which is the AE ligature, and the AE is obviously of a different length than the other two, because there's an additional code point in that String. Coming up with a way to resolve that is actually quite difficult, and it doesn't seem to be something people were using very often. So considering it was holding up numerous features, and considering it wasn't working right, and a bunch of other stuff, I had decided to remove that. It's a little unfortunate. I can revisit it later, but I'm not really going to be working on that for the time being. I really would like to add that in later. I'll have to come up with my own implementation for it probably, but I would like to implement that again later. But while also trying to track that down, I had noticed that there was some significant hardening I could do. The whole project had originated as an extension library basically. The actual stringier package is just a bunch of extensions for characters, innumerables of characters, strings, and innumerables of strings. This to provide some useful common string operations or things that were a little tricky to implement, just have them as a single method goal. Like chopping a string into chunks. It sounds sort of contrived and like a toy, but it's useful for situations where you're sending text through a fixed size packet, so you can chop it to like 254 characters or whatever your packet size happens to be, and then just send all the packets with those chunks. And then the actual patterns engine, stringier.patterns, had largely originated the same kind of way as just building on extensions and then turned out to be actually quite a good approach. So there were a lot of missing things like null guards and range checks that I'm adding, and there were some problems with algorithms, just minor problems. Chop didn't quite work right on certain length, chunk sizes, whatever you want to refer that as. But there were some others. Overwhelmingly, it worked right, and that's why I went unnoticed, but because I'm needing to add more safety checks and more tests, and since I was doing a general audit, it's like, all right, well, you're going to be going through the entire code base looking at every single implementation, writing tests, numerous tests for every single implementation to make sure there's good code coverage. But also, I had done a little experiment that really affects stringier.patterns, parsing engine, more than just general string stuff. But as it turns out, it's possible to do many types of string comparison using SIMD instructions, which also means it's possible to do the string comparisons using graphics card shaders, which really fast, really, really fast. Considering the .NET world, you're working with 16-bit code points. At least on my machine, that means you can do a vector of 16 characters at once, so you can compare a string of 16 characters long in a 16 code points long in a single SIMD instruction. Now, there's a little bit of overhead, so it's not a 16-time speed-up, but in a lot of cases, it's considerably faster. Now, it is still subject to a lot of the control logic. If you have an alternation, a choice of something that's four code points long and something that's three code points long, you can only check those chunks individually. And similarly, if you're checking for something that has a plural form, you can check the non-plural form and then have to check the plural ending separately. So there are limitations, but considering for most textual patterns you wind up having predominantly consecutive strings, consecutive code points, it's an opportunity for considerable speed-up, but it does require rewriting a lot of the underlying stuff in the parsing engine. And it's considerably easier to get that working without cultural comparisons, because that is strictly an optimization around ordinal comparisons. And as it turns out, because of how the engine works, you can also do an optimization specifically for case-insensitive comparisons by pre-converting the code points to either lower or upper forms. Now, it turns out to be better to do that for upper forms, upper casing. But again, that kind of optimization requires a little bit of rewrite in the engine itself. So because of all these things presenting themselves around the same time, I've decided to go through and do a full audit of everything and just sort of get it all in one sweep. And when I do that, at least for smaller projects, and despite the efficiency, despite the amount of work I've been pouring into it, it's a small project. The entire libraries are less than 4,000 lines of code, and the pattern engine is barely over 3,000 lines of code, so small projects. How I like to do audits when it's small projects like that is to gut the repo, make a local copy of it. You can either view that by copying the files or by cloning another version of the repo just so that you have reference files, and then go through and manually write everything again, just to kind of force you to think about every single implementation and go through and not get lazy about tests, because that's the thing I notice happens a lot. When you've gutted it and you have to think about writing the tests all over again, you tend to be more diligent and not go, oh, well, this one was already covered, and it turns out you're missing nine important cases, maybe eight of those passed, but one of those fails, and the guy didn't mean to know that. So, yeah, that's a thing. I also sort of decided while doing this, while starting this audit process, that it would be useful for this project to have access to writing code in IL directly, and I know that's not something the overall majority of people ever do or need, but as it turns out, there are quite a few algorithms that are actually easier to implement in assembly, which IL basically is, than to implement in higher level languages. That sounds kind of bizarre, but there are some edge cases where that holds true, and there are quite a few cases where it's considerably faster. Optimizing compilers have come a long way, but certain language constructs just get in the way of that still, but do note that if performance is your top goal, exhaust basically everything possible before going that route. You'd be surprised how many optimizations you can get out in native language. You really should not be dropping to assembler in most cases. But there are also some expressive reasons for doing this. There are certain things that I would like to implement that can't be adequately expressed or can't be expressed at all in C sharp visual basic or F sharp, and that's been a limiting factor for various reasons. Now that mostly has to do with the project that I'm using stringier for, but it does affect stringier in certain cases. So there is ways to go about that, and the one that I found that makes the most sense is a project called IL support, which piggybacks on the MS build build system, which is compatible with Linux and macOS guys. So the solution would still be built using a simple dot net build, which is good. You want build systems to be really easy, even if you're supplying the stuff through Nougat, you still want the build system to be really easy. So it's not going to change anything in that regards. But he figured out a way to provide projects with mixed in IL. So you can have a project that's still predominantly in C sharp, but you can mix in IL files and it'll compile both into a single assembly. Now, unfortunately with the newer dot net framework releases, and I'm not talking specifically the older dot net framework, I'm talking all of the framework runtime SDKs. He, Microsoft kind of broke his extension. So I've been working with him on getting that actually working again. And it's been a massive pain in the ass. Nobody really programs in IL deliberately. So there really hasn't been any type of good support. There's not much considerations for making these things available. So when they did a breaking change on where IL, ILASM and ILDASM are located and how you get them and all that, they didn't think twice. They just did the breaking change because people don't do that. So for the few people that actually do that, yeah, but we're figuring it out. As of yesterday, I worked out how to get them reliably across platforms and the stuff required to get them running reliably across platforms. So now it's just figuring out how to integrate what I found back into the build system. Obviously, if it's already installed on the system's path, then it's a non-issue, but those tools aren't normally exposed on the system path. So yeah, that's where I'm at. Like I said, expect, well, I don't know if you should expect the next video today. I will record it today, but expected by the weekend should be done before the weekend. But I'll try to get it done today. Until then, have a good one.