 All right guys, it's been a while Obviously enough. I am back upstate. I'm no longer down there helping out Which is a nice load off my back because that was getting stressful Probably not for the reason to you guys are thinking either but Definitely glad to be back It doesn't mean for the time being I'm looking for another job, so That's that means a lot of free time to do development. So the pace of that is going to pick up quite a bit That being said there are some stuff that I want to talk about that I have done that Just hasn't had any updates on it yet We'll cover the most interesting first there's been a prototype streams API that I've been working on that Just for the longest time as I'm in making any progress I Needed it for one feature Literally just one method that you can't add as an extension method And it's so annoying that I had to do this, but it wound up being useful for other things as well, so But to peek at the next Character or rune in the stream Regardless of what the underlying type of stream is And that's actually incredibly important for many text processing algorithms But yeah Whole project because of that that being said after looking through the code in the dot-net runtime I was pretty sure that very large portions of it weren't being written all that well and Had trouble getting my own implementation to be any better, so Maybe I was wrong. Maybe that is written as best as it could possibly be right well There's one thing to know about me. It's that I am Incredibly determined and we'll keep chipping away at something until I finally have a breakthrough or And completely certain that There's that I'm just wrong and I got these results that is such a massive performance boost that I genuinely thought I Had the benchmarks wrong somehow that I was I don't know only reading part of the stream in my case and That's why the numbers were so small, but no Nope, that is Spent about four hours Testing the shit out of that to make sure I was actually benchmarking what I thought I was and yeah Now that is specifically for an in-memory buffer, which is Actually surprising because that's the one I would expect the least amount of performance gang at all, but It's possible No one thing to keep in mind about what I am doing is because I have the opportunity to start completely fresh on a new API It's getting a rune nutty character each time and that's because that helps avoid quite a lot of the problems that Come from people making assumptions about what a character actually is depends on your language, but in the net world a character is a UTF 16 code unit not a Unicode scalar value and C or C++ it gets even more confusing and then various languages have their own thing that they do In a few cases it is actually a Unicode scalar value But more often than not it's a code unit associated with a specific encoding type That causes a lot of confusion So by fetching a rune each time instead of a character you greatly simplified things You can ensure that the same exact code can be written Regardless of whether it's basic multilingual plain languages like English or Russian Russian and basic middle way the way either way you've got to get way way way way way out there for UTF-16 surrogate pairs to be required to be used and We're actually seeing more and more cases where that has to be done so Avoiding that greatly simplifies things same code no matter where it is there's a number of architectural reasons behind this and I'm not gonna actually delve into why that's the case You may have noticed that a few of my libraries now are closed-source at least partially Streams is an example of this and streams will in its entirety be a closed-source library I'm not gonna open up any part of it You can probably at least somewhat tell why given the performance implications and I'll explain some more about the general stuff Part of the glyphs library is closed-source, but only part of it the data structure and the algorithms behind it are closed-source whereas the tables that drive it are open-source So if you want to add support for New glyph equivalencies say like I haven't gotten to Russian yet or the Cyrillic Characters yet, so if Say somebody wanted to do that. They wouldn't have to put in an issue They could actually go in and add that stuff to those tables and the other library will just work Which is pretty awesome There's not a whole lot to say about that library however because there's just not a whole lot there It's a sort of micro library just for glyph equivalencies and working with glyphs Sort of the most useful thing to most people would be the glyph dot equals method that Doesn't much better job at what we really mean by two strings being equal That being said If it's a hot path kind of thing, especially like if it's say Code for a compiler or some other type of language analyzer where you've got a You know computer language, even if it's not a compiled one say Or even if it's not a programming language say it's like XML or something You wouldn't want to be interested in that. That's Not the kind of thing you want to be doing because it's slower It's what people mean, but it's it's slower Which would make sense. You've got to check all sorts of equivalencies. You've got to build up the glyphs Which takes additional Time to parse and But the so that's the that's the other closed up one there, but To cover more on streams Despite being closed source it is any extensible Library there's the stream base class Which implements the absolute bare minimum for a stream to work at all and when I say bare minimum I mean incredibly bare minimum read a bite write a bite and a few properties and that's it If you want to extend the streams to support a new data stream type Say like right now it doesn't support streaming over a pipe so no network streams You would Go and extend the stream base to support working with pipes and boom you have an entire network stream There's a read buffer and write buffer for implementing your own buffers That's another rather useful thing in that they're I Mean the fact that they're separate means they're obviously not tightly coupled together Which is a generally good programming practice and something Microsoft did that is just dumb and Actually causes some problems And if you want it to support a new encoding type you'll actually have to put an issue for that one in That one's not publicly exposed at all because for like bomb detection and other stuff You actually have to do that inside of the constructor So there's no pause Shouldn't say that there's no feasible way To extend that You could But it's a it's it's a massive pain to do it in a plug-in kind of way Now I don't want to mess around with that Because one thing I will say is my constructor takes quite a bit longer than Microsoft's it's not horrible, but it can be It's quite a bit longer Luckily you're typically not creating a bunch of very tiny streams So that's kind of why I optimized it that way as well I say like parsing an entire file Which is a actually rather common way to use streams You're gonna instantiate the stream and then you're going to be Literally spending 99% of the time doing a combination of reading and parsing it the instantiation really just isn't that much time so of course I'm not gonna care if the approach I have is Really fast for the common case and slow really slow for the For the time you barely spent doing anything that means that it's not like atrociously slow Figure As far as parsing speed goes mine is about six to eight times faster Whereas construction time Microsoft's about four to five times faster So even if you were doing a completely balanced situation of both of those you're still gonna get an overall performance boost out of mine anyways But it is something to keep in mind I don't know why anybody would ever create like thousands of very tiny streams But if you're going to do that don't use mine but lastly the stream type itself is Inheritable But barely Partially as an optimization partially because I don't want people going into breaking things The majority of the methods are actually sealed so the major point and inheriting from the type is to add support for entirely new things or Or To add in or utilize the protected constructors I Have my own use case for this where I want to sort of expose this as like a file API as well That's sort of just experimental, but regardless I'm gonna keep the the stream inheritable minimally inheritable but There may be other Uses for that. I could definitely see some situations where That could potentially be useful Note that that is not the way to support new new data streams, so if you were going to say stream and a I don't know controller input which is something that I would never provide in the library itself you Would not inherit from that you would inherit the stream base That handles the actual data stream The the stream class itself really is just orchestration between all the different components and the high-level API Lastly I've been some work on and this is released the The streams API has not been released yet. It's just in like a beta, but the The literary sub project has actually been released and What that is is a collection of methods used for Literary kind of applications It's quite small right now, but I have quite a few things that I intend to add to it But it's primarily for Well, I already said what it's primarily for but there's some examples of what's in it would be like methods to determine whether something is a palindrome or not Correctly because it's actually quite a bit more complicated than just reversing a string and comparing it to two are equal There's things you've got to remove and there's The comparisons can't just be simple index in and compare kind of comparisons Palindromes actually get sort of complicated But also currently methods for things like whether or not something is a heterogram a pangram a lipogram and others I think I've also implemented isogram. I think but there's there's quite a few others that I need to do and Quite a few other literary things that I need to be able to identify This library is entirely public and Part of that is because I didn't want the table separate But also in that I wanted to show off a little bit about the power of table-driven algorithms so Typically speaking when you come across methods like that they work for a specific language They're written assuming say English and they'll give you any English palindrome and any English pangram Well, that's great and all but What about all the other languages? Do you write one of those for every single language? Because I get super tedious But what you can do is take tables code algorithms to those tables and Then swap out the table that it's determining for No, it just so happens that these tables are language orthographies and so currently I have it written for three different English orthographies because Well, you've got English with Latin script, which is the one everybody knows You're probably getting the hint that it's a little more complicated than just that right now There are two other scripts that English is written in You don't see either of them particularly often But they were defined because it was easy to transcribe them because you're just transcribing them You're not translating them And It would show off that what this this approach is actually capable of so it's also got it for Shopee in the script written by Written by the author George Bernard Shaw, I think it's the full name as well as Desiret of the the Mormons, I believe they're the ones responsible for that Some Interesting reasons behind that I'm not gonna get into that right now. This is some sort of channel on cults. This is a channel on programming And a few other random things but not cults so Because of this because of the table-driven approach you could specify that hey, I need a pen. I want to see if this is a you know a lipogram in English written in Desiret script that excludes these characters and It'll do that Which is pretty damn awesome If you're interested It's actually super easy to add in any new tables All you have to do is it make quick language definition if your language is not defined So English already is but say you want to do I don't know Ukrainian you would need to add in the language for Ukrainian The language itself is a rather simple definition There's a few properties that need to be filled out. But otherwise it's super easy kind of thing You could code in about a minute then's the script. That's the writing system used And Ukrainian sense it would be Cyrillic Now assuming say Russian was already implemented in there, then Cyrillic probably would be as well And you would not need to duplicate that however Currently it's not so you would need to also implement Cyrillic as well and That's just some information about how this Cyrillic script works That again very small thing could take you about a minute to code So the two of them together something you can easily just jot down during the middle of a break Then is the orthography and the orthographies are written for every combination of language in script That's valid Those are A little bit longer because that's where you're actually writing out every single character in the orthography so every single symbol of your language uses as part of its writing gets written down there and There's some holes currently it really only deals with like The consonant vowel semi-vowel kind of thing Other symbols like punctuation and numbers are not currently a part of that so they'll Need to be at some point, but I'm figuring out how best I want to do that So currently it's just you know letters Different letters or whatever your system uses But in that case like Ukrainian with Cyrillic script would need to be added that takes a little bit longer But still shouldn't take any longer than five minutes. So overall still this is a pretty short thing And then you know write in a test case, but considering its data-driven tests you just put in your Inline data with the shit and that again takes less than a minute. So eight minutes Well, thanks you take eight minutes to add in a Single orthography and that's for if you're not experienced and can't do these really quickly yet because you're still learning like what goes where so And then boom they should just work with Ukrainian at that point So that that kind of thing is actually pretty awesome And this why I did that approach because it helps ensure that it's very easy to add in And every every algorithm defined in the in that library will just work with that now So I go in to add something else later on Say like determining alliteration and Every language that is supported will just work with that new alliteration algorithm That's pretty awesome So Yeah, that's where I'm at right now. You can expect more videos soon Especially since you know, there's gonna be more to talk about and it's stuff I can actually talk about can't talk about medical stuff That's that's bad. Don't do that So until then have a good one guys