 So welcome everybody to my talk on new spell version 3 Also talked about new spell last year and this should give me give an update and also gonna have some space for your input because we're here at a meeting with all experts and Let's see how this works So these are the things I like to talk about give a introduction on new spell how it works the technologies We use the dependencies on spell checking in general in operating system What's upcoming so in short new spell is a spell checker. It's a Free open software. It consists of a library and a command line tool and it's written in the C++ 17 our team consists of Dimitri from Macedonia and myself from the Netherlands and Gonna go quite rapid through this Spell checking is not trivially trivial Some people think I just a long list of words you check if the words in the list and then you know if it's spelled correctly or not Usually for some languages those lists would be endless. So all sides sorts of mechanisms have been devised in order to do spell checking to support conjugations complex morphologies compounds and so on Please have a seat So for more on the history of spell checking you can check some talks I gave in at Fosdam earlier The specific goals of new spell are to be a drop-in replacement for browser's office suits and All kind of application that you spell checking It supports the my spell and unspell library dictionary format The big differences with the other Existing spell checkers is it's much faster Maintainable as has less dependencies. It's more portable. We as you can see later on put it on compiled on many different platforms and Also opens the door for more optimizations and functionality in spell checking in the future where other spell checkers have sort of Walked into a dead end at the moment It supports many character encodings complex worded compounding affixing rich morphology Etc. Etc. Also discussed last time and Like before we had now another grant from Mozilla for which we were very thankful. Otherwise We couldn't have had the resources to develop this Spell checking library to where it came now we had two projects and We're in the sort of two-third of the the second project. We had a version three out recently and Since last Friday yesterday. We have packages for Debian and Ubuntu So now it really starts to count and it's usable because people don't have to Compile it on the system because install the package has started using it We did a lot of testing compared to home spell The differences are negligible negligible and the speed at the moment It's three times faster, but it highly depends on which language you're using and Soon on our website will be a complete overview updated overview of those speed-ups and performances So a little dive into Spell checking won't spend too much time on that unless you have specific questions on that This is sort of what it entails and to give some examples that You find along the way which you don't think about normally is that certain languages? For example something simple like going to uppercase or lowercase is not trivial at all So you can see for example for Greek that you can go uppercase that way, but if you go back From capital to this one. Well, you know, it can be only one But it depends where it's in the word Another example is the capital I with a dot above for Turkish or The ligature I J Dutch or sharp as German These are things if you would go back and forth going uppercase and lowercase, which you do a lot by the way in Spell checking you might end up in a in a other words You don't assume you're ending up at so with previous spell checkers, it was usually implemented with specific exceptions and disinformation is Contained in the Unicode library the ICU library. So we use that instead of all specific kind of exceptions for supporting all different kind of languages and Leap I few library is pretty big has pretty wide support on these specifics and It saves a lot. It makes a The spell checker are much more efficient. We don't have to make all these separate exceptions inside. We just rely on this So the actual dependencies the library has is lip ICU and lip boost and nothing else. So it's pretty independent so after spell checking you won't have suggestions and suggestions are generated in many many different ways the ones with a star are added new in the version three the other ones we already had and Some people ask it. How do these suggestions get generated? Well, you can see There are all these different kind of ways of finding suggestions. They all got Stacked in a pile and returned to the user like these are suggestions with which you can fix your error Was not yet in there though you can force it for certain languages to push a suggestion all the way to the top to have that as a first Suggestion because you know, that's probably what the user wants but in the future we can Generate all the suggestions and sort them for example by word frequency and Probably all of you had one time suggestions from the spell checker. And then you think what is that? so we try to Make the world a little bit more boring by putting the most likely spelling suggestions at the top If you want to use this library, it's very straightforward The API is compatible with how unspelled as that These are some helper methods to see what kind of dick tricks dictionaries you got installed on your system you can get the pass of the specific dictionaries you can load one and Then you're all set to do some spell checking which is even more boring very easy to just pass the word in you get a boolean out and If you have a vector of strengths, you can have it filled with suggestions So if you're afraid of Implementing spell checking in your software don't be just have it go with you can see it's it's not rocket science The rocket science is inside the library itself so as I said, it's pretty Widely ported because we have not that many dependencies and the dependencies are on very standard libraries Which are available for most? platforms so at the moment We have these compilers which it all works for and that is sort of like maybe 99% of the world's Computers that can work with that Very shortly. These are the tools we use not very Important on if you want to use our library unless you want to become part of the team, which you're welcome So the things we changed in the version 3 was we upgraded to C++ 17 We use CMake to build the library and the command line tool The API is easier as you saw before how to set up spell checking You don't have to tell which character encoding or locale you're using it defaults to utf-8 The compounding and suggestions have been improved. It's now three times as fast compared to unspelled we made an enchanted integration Debbie and a bundle packages and the Firefox integration is coming up But these things look all really nice But we have to be maybe realistic and maybe you can help with this is that if you have Debbie in packages How long you think you're away from getting included in a distribution release? years If you have a Firefox build, how long you think it will take before it's out there? months Yeah, and chant integration. It's already a package in Ubuntu. So it just needs to be updated So that's like maybe only one release further down the road But if you happen to be in a position where you can speed things up and Debbie in the boom to packaging Please help us out then some stuff which is new and I Really like to have your input on is I I made a short overview of packages which depend on spell checking So you might know spell checking started with spell then with I spell then we had a spell My spell hun spell new spell We have also some specific spell checkers for Hebrew Turkish and Finnish, but I left that a little bit out of scope in this thing so How many of you work on software packages like? Packed stuff or developed software which is packaged in Ubuntu fedora redhead Debian doesn't matter Okay So how many packages you think are depending on spell checking libraries? long list so I With these criteria sort of made an overview So on G spell you have these things depending where they where they get their spell checking and Jeep spell itself might get it in another place So this is like fairly reasonable overview Enchant is a well-known spell check Abstraction you can get spell checking it will see if you got a spell or hun spell or whatever install do the spell checking there and get it back But you see also other spell checking Modules or libraries are using enchant so and get This sort of like to focus on like these are all spell checking software packages and these are all actual end users directly of enchant Yeah, so it's a concatenation of spell checking abstractions wrappers So we got ice spell which is a fairly old one and you would be amazed how many packages are depending on ice spell like IPv6 toolkit why not then we got the python 3 support for enchant which Caterers too much more packages than enchant caters itself I'll just go through it and then we'll have a small discussion on this So hun spell is at the moment the the best stable released spell checker And as you can see enchant uses it chromium. This is a Hebrew spell checker ice spell depends on it I don't know why Screamers Thunderbird So we saw GTK spell It took me half an hour to make up Was one less beer for me yesterday And you can see see really old spell checker Packages using this stuff and it's you find everything in here So it's like almost pulling out my hair is like So it gets worse. We also have a spell okay So, oh That's an empty. Yeah, here it is So for example XML copy editor. Anybody knows this application You can use expats. You can go to XML create XML so on it also supports spell checking with a spell. There is a Change to use hun spell Try hunting it for three or four years to get that into the final release and they say I should be there. It's still not there so This is not only a story on the new spell Release and integration and usage into spell check into software packages But actually for a lot of stuff and it makes me that I made these kind of overviews also for fonts That it can be so many packages depending on just one or two other packages Which if you pull them in this sort of pull in the whole Christmas tree of dependencies and You're stuck with it for years and years and years So on one side, I'm curious like what kind of suggestions you have for of course the new spell spell checking library What's gonna make your writings much faster better? how we can improve that but also Maybe a short discussion or questions in general on how to deal with this Jungle of dependencies That's my reaction to when I see this Anybody try to get packages released and run into this problem as well. Yes there, please Age and still so is it you need it? Okay, yeah, so the reply was that you can better do it your own roll it your own way then Wait for So So the suggestion was to become friends with Debian package mails So I spent the last two days in the Debian mini camp here at Potsdam And I handed out lots of stickers and so on maybe I maybe I should have bought more beers, but the The people were very helpful to get me to get the packages properly So we have a boon to a Debian packages They built they installed they run and it's everything is super to the detail in the last latest versions I'm very happy with their support. So then I milled some of them So can you get that in unstable because I'm giving a talk about it? No But okay, I understand that there has to be checks and reviews and so on and let's take time And that's just the way it is. I respect that Because it also ensures that there's quality software in the releases No, it's not it should go because it's a new package it should go to new and then it should go to unstable and So get getting it into releases it will it will land there finally That's not so much my problem my problem or my worry my worry is more that If you work with software, please Use the latest version of whatever libraries you're using and try to to Relax some of the dependencies because this is so much so much Anyways Yeah, I had to upload the slides and then you have to type in how many pages as well So at the moment we support the Ubuntu 1910 and Debian 10 there is a build for free BSD We like to build from these other platforms if you have interest, please contact us and We help you try to help us or the other way around At the moment is C++ and see bindings via the the API But we like to have language bindings for Python Java and so on so Ruby whatever Some people told me that there are excellent Ways of generating language bindings. I like to into that and they have like many different ways of doing it and You can also hint us like try this one to generate it We will do the work just point us in the right direction, please So That's sort of in short what new spell does what we were up to last year and The shock and awe of the dependencies Three we came across. Yes No, C and C++. Yeah, and if you run into a problem just The question was do I use do we offer plain C bindings or C++ binding we offer both C++ and C and If you're running through a problem Trying to link or use or library. Let us know why an issue Thanks for your attention here in this also afternoon unless there are more questions So there are nine the question was what language do we support and what's the process of adding new languages? The languages we support are the ones which are also supported by hun spell So these are already existing. We just use these I'm myself involved in updating the Dutch language support you can Read the documentation and start building language port for excuse me a language which hasn't isn't existing yet If you have questions also contact us As an exercise I I don't understand it why I implemented a Klingon spellchecker Just to see you how that would go about starting from zero and Klingon for those who don't know you can write it in Latin characters, but also in Klingon unicode characters So it was a good test to do spell checking on completely weird unicode characters, which are way down and it still works so These 90 languages sort of reach about 200 countries or regions around the world Because like many languages are spoken in more countries You have any specific language in mind maybe Like L fish Also, like how much time it would need so if you now come up Yeah It depends how much information you have on that language so for that Klingon I had a I found a database in which Conjugations were also available and that helped making the affix file but start with the word lists and then see if you can Add language rules in order to shorten the word links or extend the coverage of compounds and so on Yeah So the question is how does the integration go with firefox and liver office That's just an API as I showed it's it's a shared library or you can link it statically It's just a few calls You make and it's the same calls you make with hun spell So if you're going to migrate it's meant as a drop-in replacement So you don't have to rebuild restructure your program in order to use it. The second question is how to Enrich the dictionaries with word lists. That's a good question all spell checkers support personal dictionaries for maybe your your name or The street you live in virtually is a lot which may not be in the dictionary for your country, but That's our actually how I got involved. I'm moved from one Installation to another installation and I was moving along my personal dictionary all the time and it grew and it grew That's like, yeah, but these words should be in the dictionary So just and you have to look up who is responsible for that contact that person and Just send the list of words like please include these for English. It's a website called skull Maybe you're familiar with on our Wiki in GitHub. There's a whole list of all the languages which are supported including the contact information So you can go there you can send an email or go to the website They have their rules in order to include a word or not and otherwise you have to keep it in your personal dictionary And as you saw the whole dependency Challenge is also another challenge with personal dictionaries because you have a personal dictionary for all languages You use in Firefox in LibreOffice in Aspel in Enchant in Hunspell So have a look around on your system How many personal dictionaries you have and how much they differ and they also use different formats so See if you can contribute words to those upstream dictionaries and make them smaller Lastly I also want to thank my colleague who couldn't be here Dimitri. Yeah, there's a lot of the the art in complex world word work excuse me and the Dutch formulation open Tal who sponsored the stickers which are available here at our false damn stand So if you didn't have any stickers yet come by our stand tomorrow And get some stickers Thanks for your attention