 So I jumped the gun on something, and I want to explain why. Much, as I like to say, I'm, you know, let's test everything extensively, make sure you know what the fuck you're talking about kind of guy. We're all human, we all make mistakes. You ideally, ideally, want to make that as little as possible. I've done a lot of experiments trying to adapt at least parts of goal-directed programming to languages that aren't goal-directed. With some surprising success, usually there's some syntactic oddities you've got to deal with, parenthesization in weird places that you wouldn't normally need to, that kind of thing, but you can definitely, definitely do this. So when it came to adapting the way errors are passed through a chain of commands, I got a little too hopeful with how performant the results objects that Stringer.patterns and Stringer.streams are using. Those types are absolutely phenomenal at what they do, but they are library-specific. So when it comes to trying to adapt those universally to another language, using Stringer as another language's runtime, that is not some nice simple thing that you can just do. You have specific behavior per library and I have to code that in. That's less than ideal. So considering that in the fact that I managed to make this universal, I jumped the gun a little bit. I made an assumption I shouldn't have about the performance of making that universal being roughly the same as the performance that I was getting out of these result objects. And it's not. It's definitely not. I've tried a lot of different options. As a big performance optimization guy, I know a lot about how different ways of implementing the same thing have huge impacts on the performance characteristics of your program, how it responds in certain situations, how far it deviates in that response curve, how much memory gets allocated, a lot of those things that matter quite a bit. None of these. None of the approaches that I tried. Provided even remotely decent behavior. One of the most obvious choices. You just have a thread local field for where the exceptions can get stored. Stored as an assigned to, not necessarily thrown, but you can obviously throw that when you need to. It's still an exception instance. It's just your normal way of working with it would become assigning it to the field rather than immediately throwing it. You can within each library. Take the exceptions that you're going to throw, put them in a common place as static fields. This ensures that they're only ever initialized once in the entire library and that they're assigned based on that. This means that what the thread local field is doing is merely changing which one it points to. Okay, that's actually conceptually not a bad option at all. The performance I was getting was just a little over a hundred times slower than what I had already worked out. A little over a hundred times slower. That is completely unacceptable. What if I told you that was one of the faster ones? Yeah. So another, you can have static classes and C sharp be generic. Okay, what happens is you get a new static class per type parameter. So what if we take these static classes, instantiate them per error enumeration type and start storing that locally? Yeah, it works. It's a little funky. If you're ever using multiple error enumerations in the same library, you wind up with this situation where there's multiple of these state objects that one can have an error, the other can't and you're checking the one that can't and you get back that there's no error and, well, fuck. If you're consuming code that uses this and you yourself want to use this, you have to start managing multiple of these and it spirals out of control to where you could have potentially downstream libraries that have to manage, you know, seven of these different error state objects that is absolutely insane. But how's the performance? Maybe we could work out some kind of control registration thing as long as the performance is decent. It's actually the best one I had worked out with the exception of actually coming time to throw the exception. That part was slow as I'll fuck. But that part you don't really care if it's fast because remember with this execution model, the only time you would be throwing an exception is when it's an unhandelable error. In which case you don't really care if it takes extra time to do it anyways. We're not talking about a fail-fast server. It doesn't matter if it terminates a little slower because you, as long as everything else is faster, that's the part that you want being slow. So how does it look? Well, like I said, it was the fastest one that I had worked out, but it was still only a little under 100 times slower. That's completely unacceptable. There were a lot of other options. One that I tried was to use a sort of combination dictionary and adapt it to all sorts of different actual collections and see how they ranked performance-wise. The general idea was to start off with two dictionaries. One which mapped an exception type to an error code and another which mapped the error code to an exception instance. This was essentially where I started because the whole idea was to be able to register exception types to it. Use the hResult property to get back a constant error code matching, well, not constant, but you'd have the same error code for every exception type. You've got an argument null exception. It doesn't matter what the instance is. It's always going to have the same error code through hResult because of how hResult works. We can work with this, right? Because of how hResult is implemented, and I think this has to do with historical reasons with the language, but it was implemented as an instance property even though it could be a static property because, again, it's unique to each type, not to each instance. But it means that when you take a generic method that should be registering the exception type, there's no parameter, there's no exception instance. Because hResult is an instance property, it means you have to create an instance of the exception that you're going to register just to get the hResult value. Well, that's not going to work, and it didn't. I mean, the code worked, but the performance, 14,000 times slower. Hear that right? 14,000 times slower because creating exceptions is a very expensive operation. See, the dictionary look-ups weren't really that much of an issue, but the creating all those exceptions, that's what we're trying to avoid is creating those exceptions until they are needed. Like I had said, I had changed out the actual collection types, trying to use all sorts of other things. To varying success, there were some collections I had to implement myself or get from NuGet packages because they were highly specialized things. Got all sorts of performance boosts, but there wasn't a single one of them that I got to anything even remotely decent. But then I did try one thing. My own specific collection type. Let's do a very specialized collection exactly for this purpose. Write it up, get it implemented. It's essentially a like a skip list kind of thing going on where there's some trickery done with the H result by doing unchecked conversions to unsigned integers so that they're always positive, which then, because that operation doesn't cost anything because it's unchecked. And you don't care about the magnitude. This is literally a code, just an identifier. So as long as it's unique and the H result is unique, unlike the hash code, then we don't care what the magnitude is. It doesn't matter if it's in the positive or negative range. We want the code. So then we can sort these after converting them all to positive. That simplifies the math. We can sort these. And then because of the properties of skip lists, we can search through that much more quickly. And oftentimes skip lists can even be searched through quicker than trees because of the properties. So how does this hold up? Well, by absolutely massive margin, it wound up being the fastest collection. About 800 times slower. But still, compared to the 14,000 and couldn't get it down to anything under 4,000, 800 times slower is actually really good as far as dynamic collections go. I was surprised to see that result. So then there was one last option to try. Sparse arrays. Implement up the sparse array triad because the idea of sparse arrays, the obsession code, the error code hasn't been registered, then it'll give you back a null because there's nothing in that bucket. If it has been, then that bucket's been allocated. The array indexer will give you that bucket and that'll have the exception instance. Try that. Compared to many other collection types, it worked well but it still not. It was clear, very clear the problem had to do with the fact that the hresol property was an instance property. Were it a static property of the type? Well, this wouldn't be an issue. You wouldn't need the new instances and that's clearly where the performance penalty was. But I already tried the single fastest things I could think of. So let's try synthesizing this behavior using either returning the error state or outputting the error state. Going more, you know, almost more idiomatic C sharp style tri-method. Sure enough, performance spiked way the fuck up. So, all right. What the hell is going on? Because I've profiled code written in Unicon against similar functioning code written in more standard languages. And the performance is largely competitive. So why is it so slow in the .NET world? After looking at the IL, at the assembly, it definitely seems like what the problem is. I'm trying to adapt language semantics to a language that doesn't have those semantics. And that's where the real problem is. Like I mentioned when describing how goal-directed execution works, typically speaking, there's either a register that you store the error state in or you return two values. Those operations are very lightweight. Okay, so I can't do goal-directed execution in this. It means I can't directly use string here as a runtime. But these languages that do this surely they don't produce the entirety of their runtime in themselves. Surely they bind to pre-existing stuff. You reutilize pre-existing stuff. You would certainly hope so, considering one of their big things is sex processing, and that means, you know, file APIs and other types of very complex code. You'd want to bind to stuff that already does it, right? So, sure enough, they do. Now, how? What is it that they have going on? Well, the idea, especially taken from somewhat how Goaldy does it, in the context of a goal-directed function, you can import non-goal-directed functions. Utilize those, wrap the behavior up as a correct goal-directed function. And then use that. That'll wrap all the semantics up. It'll still allow you to bind to underlying stuff. So you just need the appropriate convention. So you want something that doesn't raise an exception, but you want something that still reports whether or not it succeeded or not. Well, the C-sharp-style try method idiom works phenomenally well for that, because the whole thing about try methods is they shouldn't return an exception, except in very unusual circumstances. But that happens sometimes. But the overwhelming idea is that you're just going to get back as the result of the function whether or not it succeeded or not. And the actual intended result is an output parameter. So we go that route. This could work. This could work. You know, is it still justified to be working on a version 4.0 for stringing? And I feel like it is. Well, it's not... It wasn't just an opportunity to try to adapt a goal-directed execution model to the entire library set. It was also a big opportunity to audit the entire thing, to make sure that all the APIs are there, that they are there the way they should be, and so on. But there's another thing. It's a tremendous opportunity to ensure that the stuff is as optimized as it could be. So I've already been tinkering around a little bit and managed to optimize the absolutely living fuck out of the CHOPP method. CHOPP's a little bit of an interesting one, in that it doesn't seem like a super useful thing, but it actually is in more complex scenarios. Situations where you are sending large pieces of text across the network or whatever. And the protocol for doing so is fixed-sized packets. Well, now this becomes immensely useful. You don't have to do any of the work yourself. You just chop the entire thing up, and as part of your enumeration over the chopped pieces, you just send a packet each time. That's fantastic. But you want that to be performant. So this is an example. Chopping a string, the string Hello World, into chunks of one character each. On my machine took about 200 nanoseconds. That is, with the older API, where it would return an array of strings. Each chunk became its own new string, so that's an allocation. And there's an allocation for the array itself. It's less than ideal. Ideally, you'd like to be returning read-only span of cars of that. Now, read-only span is a ref struct. So that means you can't use it as a parameter for a generic template. Arrays are generics with the type being a parameter for the generic template. You cannot use read-only span of car for an array. Well, fuck. But you can do something rather clever. You can declare a new type. This type, the chopped string, we'll call it. The name's going to change slightly, but it's still going to be chopped and then something else. But the idea behind it is it references as a read-only span of car the string that it came from, the source text. And then there's a little bit of extra information that goes along with it. But the idea, the idea is what's significant. At the time of creation, when the method is called, all that is done is that type is referenced as part of the chopped string with these chopped string semantics. A tiny little calculation is done on how many chunks total there will be. But otherwise, that's the extent of the work. From there, you can index the various chunks just as you would index an array of these chunks. And then it only does the calculations for that chunk. Or you can enumerate overall the chunks, which is the most likely thing you would do, in which case the code is essentially exactly the same. And, well, what do you know? The performance results really were significant. Turns out the creation of the entire thing only takes about 30 nanoseconds, versus the roughly 200. We're talking about 170 nanoseconds going into allocating, getting the first element. That was the only one I could safely do for all sorts of different lengths and chunk sizes, stuff like that. Similar results amount of becoming immensely faster. In fact, there was almost no overhead for indexing into this. And that is because the math for developing that slice is actually very straightforward. There's a tiny amount of overhead, of course, but it's very insignificant. And enumerating over the entire thing? Well, even that was a tiny little bit faster, because we were now returning appropriate types. We weren't ever allocating anything. The whole thing became faster. And that's where I realized, holy shit, the original API was actually pretty bad, because I'm doing allocations all over the place. And you really don't need to be. See, something that Stringer makes a lot of use of for providing these APIs for as many textual types as possible with as little code reuse, or as much code reuse, as little code duplication as possible, is my clever overloads and conversions. A function like chop, for instance. Its implementation is done on the read-only span of car type. But then you can easily provide overloads to anything that can become a read-only span of car. This includes String, character arrays, and character pointers. See, that was something that I had an issue up for a while, back from March. But I can rapidly implement those. Regularly returning a read-only span of car actually greatly simplifies a lot of things otherwise, too. And it doesn't really change all that much. You would obviously need to recompile your code to use the newer libraries, but it's a major version bump. Of course, things are going to change. But I'm not afraid to tweak the public APIs a little bit at major version bumps. See, I... What I consider important is the intent of the code, not the exact API. Within minor version bumps, you should not change the API. Make something obsolete, add new ones, whatever. But major version bumps, it's acceptable in my opinion. As long as the intent does not change. The only time the intent should ever change is if the intent itself was wrong. And in that case, you fucked up and shouldn't have even put it in there in the first place. You should have done a lot more testing. But I haven't run into that issue really. There's a problem that comes with making read-only span of care the standard return type for everything, however. Or some special type to provide the semantics that we want, like the chopped string. You can't use this from F-sharp anymore. And honestly, I'm okay with that. I do like having the F-sharp bindings there. They're useful, they're convenient. F-sharp is great for a lot of reasons, but holy hell is it not good for performance. F-sharp is a god-awful language to use. Any functional programming language is a god-awful language to use if you are concerned about performance. Because many of the functional paradigms are antithetical to performance. There are reasonable ways of getting around a lot of the problems like making something publicly immutable but privately mutated instead of copied repeatedly within your own library. Some of the goals of functional programming are reasonable, but look at some benchmark comparisons. It's fucking horrifying. But see, a major reason why I don't really mind losing out on that is because of what's gained. Or rather, what is lost but lost in a good way. You see, if the entirety of these libraries comes to essentially just be working with slices, then that means what could eventually happen is a large source text gets loaded up, gets passed through the parser and then manipulated through all sorts of text processing methods and all these things done to it without ever allocating a single thing. Because they all reference the initial text that was loaded into memory. Everything is simply references to parts of that text. Anybody who's worked in a business that has their own servers that do large amounts of text processing understands the implications of that, understands how much memory that saves. To an extent, I'm also going to lose out on some visual basic support. It's unfortunate, but it is what it is. I can look into specialized bindings that adjust everything to work with string instead because you can inside the method, well, inside the function, inside the F-sharp functions, return string for everything, do the C-sharp method and then take the read-only span of care and convert it to a string. From there, all the idiomatic F-sharp stuff like pipelines will then work, but they simply won't as thin bindings to this API anymore. Whether or not people be interested in that at all, I don't know, but my priority at this point is getting this stuff out at ridiculously high performance first and foremost because I'm basically doing all my development in C-sharp. I haven't touched F-sharp in a good while and the F-sharp bindings, well, they've gotten a decent amount of support on Nougat. Not really sure it's justified, especially considering, well, you could still use the stuff as is. See, these libraries are almost entirely CLS-compliant. In fact, the overwhelming majority of the APIs are already CLS-compliant, so you would lose a little bit on the idiomatic F-sharp syntax, but it's not like you can't use them from F-sharp. These bindings were really just sort of a pretty convenience kind of thing. They weren't necessary for being able to use it. So, between minor syntactic sugar and massive performance implications, obviously I'm going to go with the performance implications. So, I'm sorry about that one, but I hope any F-sharp users of Stringer understand. All in all though, the naming conventions and everything, they're pretty similar. You're going to have an easy time adjusting your code and it's going to help performance a lot. A lot, a lot. There's more I could say about it, but you've largely got the point at this. Now, have a good one, guys.