 Alright, so some of you have probably already seen this up on github. I've uploaded a project, I mean it's not really an upload because basically what I did is I took all my separate Aida repos and merged them into one that I'm calling lib-an. I'm not super happy with parts of the Aida standard library for various reasons. Some of it is old design decisions that are just hard to optimize for or that aren't commonly optimized for. An example of this is returning a value through an out parameter instead of as a function result. As a function result is the more common thing. And so there are optimizers around that. Especially if you're then taking that and returning it, passing it into another function and returning it and just chaining through that. Something called return value optimization. It's a useful thing to be doing. Optimizers understand it. Do you want optimal code? Especially when it's such a minor, relatively insignificant thing. It's also a bit weird using the out parameter just because you have to declare the variable and then use the function and pass it what looks like into the function, but it's not really being passed in, it's being passed out. It's just, I mean, procedure, but you get what I mean. It just looks weird, but it's not just the looks, it's hard to optimize for. But there are a number of other gripes and some of them, some of them the Aida standards people have actually fully recognized. There are some things that just weren't really thought through all that well or things that would make a lot more sense to define with the newer Aida types that just weren't. An example of this was like controlled was defined with a tag type that you need to inherit from rather than an interface that you would implement. And so that's sort of bizarre. Create some interesting little quirks with the language. There are a few things that probably would make more sense to define as protected objects instead of records and they aren't. So now this is sort of a chance to do some of that, but there are some implementation details that I don't really agree with. Some of them have to do with the standard itself and like standard mandated implementation details. Some of them just have to do with how NAT does things or similarly even Drake, which I think is excellent. It does some weird things at times, largely to remain compatible to a reasonable extent. It has its design goals that it's focusing more on than compatibility, but it still tries to remain largely compatible. This happened with D as well, Digital Mars D, where people recreated a standard library and sort of changed a lot of it, including the layout. Just because it was crofty and I think it's sort of time for that. So one of these disagreements happens to be with NAT, no surprise there. And it has to do with how text IO is done. Watch the other videos, there should be no surprise there either. Namely in that it's sort of broken on Windows, but it's not optimal on other systems either. I'll explain why and it'll be really obvious, but it's not doing the optimal thing on Linux or any of the unixes either. In fact, it's quite a bit more inefficient than it should be. And that's not to say that the approach I'm taking is somehow super optimal and is using all sorts of tricks and sorcery to work faster. That the approach NAT's taking is just so much more work than actually needs to be done. Aida strings are an unconstrained but fixed-sized array. So if you have a string, Hello World, which is what, 11 characters? Hello's five, World's five, and then the space. So yeah, 11 characters. So that's 11 memory addresses long. The way string is defined in the standard, and I do agree with this part, is that it's unaliased. That is a good thing for that. You really don't need to get the exact memory address of each character. You can just get the character and it's fine. C strings on the other hand. And remember, the Aida standard library for any of them that I've seen is implemented on top of a C standard library. The C string is a pointer to a character. I have to correct this too. In C, it is an array of integer. Character is a type def of, retype def of integer so that you can't directly use them. It's like a new type definition in Aida rather than a subtype definition. They're not directly compatible even though the internal representation is identical. So in C, it's still an array that you cannot redefine. You can put another character array of the same length, a new string of the same length into it just like in Aida, but you cannot do a different size one. In C++ however, there is some special logic that is considered to sort of hide this. When you redefine a C++ string variable, it points to a new memory location instead of changing the actual memory that's currently being pointed to. This is actually sort of important. It's a pretty big difference. So really I should say that C++ strings are more flexible. C strings are identical really. Part of the confusion there is that normally when I do C development, I'm actually passing it through a C++ compiler. Whether that's G++ or MSVC, that's normally what I do just for that string convenience, but I keep myself limited to strictly C language stuff. Yes, to be correct, it is still a C. In C, C proper, it is a fixed array just like in Aida. That's it. It's hopefully not terminated somewhere because otherwise you're going to read a lot of memory. Or write a lot of memory. But it's just a pointer to a character. This is why C strings seem more flexible because when you define a new string, you're just changing where it points to in memory. And that approach actually works better than some people give it credit to. There's some tools to help catch the bugs that people use to deal with, and it's really not that much of an issue anymore. That being said, I do think the Aida approach is a little bit better, but I don't give the C approach a little more shit than it probably should given the checkers nowadays. I think they call them sanitizers, but yeah. So the thing is, these approaches just are fundamentally incompatible. So while it might not seem at first like it's that weird that Nat uses f-read and f-write, the drastically different representations, if you want to use f-read and f-write, you have to convert the Aida representation into a C representation, which is not as easy as you might think at first because C is dealing with memory-addressable characters. This is equivalent to an Aida array of aliased characters. Because of this, you cannot do slice copies. So in order to convert between an Aida string and a C string, you have to iterate through converting character by character. Then the right operation, of course, goes character by character. So you're iterating over it twice. Now the thing is, we're just trying to print out the text. Why not just iterate through once and print out each character or read each character? Why would you do the conversion, iterating through for the conversion and then iterating through for the read or for the write? Why would you not just iterate through once and perform the necessary operation on each character? It works. It works as you would expect. You've pretty much halved the amount of work and removed an entire allocation. So what I'm doing is, instead of f-read and f-write using what is f-put C and f-get C for character and then wide and wide, wide character using the f-put WC and f-put f-get WC like you are supposed to do on certain other systems like Windows. I have to correct part of what I just said. As it turns out, GCC's code optimizer is really brilliant. It recognizes the situation of converting string types, different string representations, and can actually optimize out a huge portion of the work that's done, essentially just doing the memory copy. You could avoid that memory copy if everybody used the same string representation, but there are valid reasons for those different string representations. But what it means is that on... I tested it thus on Linux, but it should apply through other systems as well. There's no performance difference. There conceptually should be, and if you don't turn the optimizers on, there is a noticeable performance difference. But with the optimizers on, it goes away. It basically does the same thing, and that's pretty awesome. So, while conceptually what I just described is better or at least easier to optimize for, the truth in practice is that optimizers sort of make it irrelevant and you get the same performance either way. So, just putting this in to correct that. So, one of the new introductions to the AIDA packages that I've been doing is a bindings. And it's pretty simple just because, like I had just described, I don't need the binding to F write and F read. I just need the binding to the character get and put. So, it's a largely incomplete bindings, but it's there. If somebody wants to extend it, that's fine. Go ahead and extend it. I have actually considered writing my own F read and F write implementations that look like they're supposed to with C, but do the right thing with the AIDA strings. Essentially a thick binding, but instead of marshalling, just compose the right behavior. More efficient that way. So, then I can get everything within the standard library, not within this library, doing correct and optimal string reading and writing. While I didn't intend it at all, this actually enabled a quite useful thing. AIDA unbounded strings are an interesting thing. I don't like their level of abstraction because it's very vague. It's not really clear what they are. And because of this, they can be implemented a lot of different ways, and there are some incredible differences between those. NAT implements them in an approach quite similar to how C strings work, where it's a pointer and access type to an AIDA string. This is what enables them to be more flexible because you can redefine what it's pointing to, and that it deallocates the previous one if it's no longer used. The issue here is you can't assume that's the behavior because it's just sort of implementation defined how it actually works as long as it exposes the correct interface. It's fine, which means unbounded strings could be implemented much more similar to, say, like the string builder from C sharp. Unbounded string could wind up being something like a rope or a gap buffer, which is a remarkably complicated data structure. And while very efficient for the kinds of things that unbounded string is commonly used for, isn't something you want to be using in most cases, and is a very expensive operation to convert to a string. Now, the expense of that is justified when you're doing huge amounts of appends or other operations in search, especially. But it's a potentially very hefty data structure, and even with the approach NAT takes, you're potentially dealing with huge amounts of allocations. There are some optimizations it takes to save that at times, but it's a potentially very expensive thing. And allocations are bad. You want to avoid them whenever possible. The convenient thing about the approach I'm taking is, remember, there was no allocation when printing out the Ida string using the C functions, because instead of converting it and marshaling through the conversion, we're just printing out each character or reading each character. So you can provide efficient IO operations on unbounded strings and not care about what the underlying implementation is because you're not converting. You just iterate through and do the operation on each character. That makes unbounded strings a lot better performing. For that, that's good. I'm an advocate for not using unbounded strings as much as a lot of the Ida community is using them. I don't think they're an okay thing to be using for general text IO purposes. There's better ways in most cases, but they are very useful, and the simple reality is a lot of people are using them. And instead of requiring the conversions, just provide efficient IO operations that don't create allocations and don't require conversions. So that'll be good. Once that's in place, of course, I will have... That's a huge boost to basically the entire text library. So that'll be nice. And the patterns package that I've been implementing is that stagnant for quite a while, but is actually being actively developed again. And that has to do with a lot of work I was doing with a C-Sharp library called Stringer and its extension Stringer dot patterns. Because I had found a pretty great paper from a guy who adapted the Snowball 4 style patterns to Unicon. And while I didn't... I'm not copying the approach exactly. There are some tweaks to how Snowball 4 did it. But that paper was hugely helpful in me figuring out how to implement this well. And so I have a mostly working patterns implementation that I want to use for implementing added tools in IDA. Because between the efficient text IO stuff that I described and the patterns library, the major reason for using C-Sharp is gone. Because between those two, I'd have access to really, really good text processing facilities in IDA. That's where I'm at and that's where this quite large repo on GitHub just came from. Not really anything else to say, so have a good one.