 Hello everybody here and on the interwebs, my name is Sandoval Glover and I'm going to talk to you about a new spell. A new spell is a spell checker, it's a FOSS of course. It consists of a library and a command line tool and it's solely written in C++. The team currently exists of Dimitri and me and we worked the main part of 2018 on creating this new spell checker. Spell checking is not trivial, most people think it's just a word list and you look for a word. It's much more than that, it depends on the large part of the language you're using. It uses a lot of case conversion, affixing, compounding and suggestions is a whole different ballgame altogether. So in the last decades we had spell, ispell, aspell, myspell, hunspell and now finally we have newspell. If you want to know more about spell checking in detail you can rewatch my presentation from a few years back. The goals of our project were to be a drop in replacement for web browsers, office suits and all other applications to use spell checking. It's backwards compatible with myspell and hunspell dictionaries. The main new features would be that it would be much easier to maintain, extend, have less dependencies, have maximum portability and to be a lot quicker especially on the suggestions part. So what sort of features does such a spell checking have? A lot of character encodings have to be supported. Most software works with UTF-8 Unicode but there's a lot of ancient stuff around which also needs to be supported. Compounding and affixing, it depends a lot on the language you use if that's important or not. For example, German and Dutch use this a lot. And next to suggestions, personal dictionaries, sort of the usual stuff. It should also support the 90 dictionaries which are currently available and used all over the world. And if you think you can support one dictionary it doesn't mean you support the next one. So in order to do this project we applied for a grant at Mozilla and they gladly gave us one. I also want to thank Mozilla for doing that, especially Gurf who's no longer with us and Mian. At the end of this project we did verification with hunspell, how good our library is performing. And it's almost exactly the same as you can see by the numbers. And it's already a lot quicker even though we didn't do any new ways of optimization only than porting it to a pure C++ implementation. And on the suggestions part it's already a lot quicker. And this was only the first step in this project of ours and we want to make these numbers go up a lot more. So spell checking as I said is not so easy. We'll not go into the details here but these are sort of the things you have to consider when you're implementing a spell checker to explain it will take like a few hours. And to give just a short example of one of the troubles you can get yourself into is simple stuff like two upper or two lower if you do case conversion. And you can see for example for Turkish language and Greek, German, Dutch that it's not so trivial as you might expect. So to explain a bit more on the suggestions part which is a bit easier than the spell checking part. It used to be a black box for most people so here's the secret how we do it. We start with a replacement table with a lot of known type hosts in for example diacritics. Then we use a mapping table for sort of groups of characters which might be used incorrectly. Characters have been typed extra, keyboards, mishaps because of fat fingers. Characters which are really not in the language at all. Then we also check for forgotten characters and we do some phonetic mapping. Suppose your model tongue is another language and you think you write it like that we can also detect these kind of errors. And this is how the spelling suggestions are being cooked up and then presented to you. If you want to do that yourself, use our library, it's fairly easy. This is in C++ what you need to do in order to initialize new spell. The first one, it searches your system, any system, Linux, macOS, Windows, where your dictionaries are installed. And there's like a lot of places where these can be found, user dictionaries installed by some packages installed in the system level and so on. But using it is a bit more easy. So if you want to use a spell checker this is all you have to do. Just feed the word into the spell function and it will get true and false. And if you want to get suggestions you use the suggest method. So where can you use that stuff? Currently we use the G++ and the Clang compiler. And we're also going to support the MingW and some back porting to the older C++ compilers. Go back to this one. The dependencies these compilation projects need are very minimal. We use only these three and eventually we might even throw out the boost locale to have really, really minimal amount of dependencies. This is already much less than for example other spell checkers are using, which makes it very portable of course. In order to develop a spell we use these tools not going to go into all of them. It's a fairly common bunch of tooling. What is interesting in our project is that we use the profiling in order to find where the bottlenecks were in the processing. And it turned out that we find some stuff which we did not expect and we gained a lot of performance increasing there. Especially in the regular expressions we used, we used boost regex methods. The regex we use are fairly simple but because the regex support is so wide even though if you use a simple one you can still run into a lot of CPU consumption. So what we did is we implemented our own regex method for the only functionality we needed and it gained a lot of performance increase there. Another one I can recommend is code coverage reporting. It will see what kind of code you did not test yet and it will give you back some surprises sometimes which only makes stuff better. For 2019-2020 we plan to make an X version. Of course we want to do much more performance increasing. We can do that with a different data structure which is underlying but also by currency and also increasing the quality of the suggestions. Just a quick show of hands who saw suggestions in spell checking of which they thought like, hmm, is this what I want? Did people get suggestions which are like, yes, this is the ones I do want. Now in order to move up the ones you do want and move down the other ones you maybe don't want we're going to add some functionality in order to make you more happy with the suggestions you're going to get. Some other stuff we're going to do is like migrate to CMake build system and make it more available because now it's in our development environment. We can build it, everybody can build it but in order to really install it as a package on your distribution some more work has to be done and therefore it fills them. I'm going to see if people are interested in helping us out because we would like to port to these platforms and packages and also add these language bindings in order to everybody enable using new spell. Some other stuff we should like to mention on new spell is that on the way we encountered a lot of bugs which were in the dictionaries and word lists we used and we need to either catch them in an exception or fix them and it took a lot of time to fix that. Then we also encountered is very difficult to test for incorrect words because there's a lot of word lists with correct words but not with incorrect words and good suggestions. So if you might have a list of those for your language please submit this to our project and we can improve our testing processes. If you're in charge of an IDE and text editor, Office Suite, web browser and would like to support using new spell also drop us a line or even join our team if you're interested in spell checking. At the moment we're only two people. It's complex work on one part. It's very deep in the system. Most people have no idea where spell checking resides and how it's done but it's very interesting to work on this. And improve people's writing especially in an IT world you might think that oh is this a correct word or is this spelled correctly? Now with better spell checking we can all help write better texts. So this is a lightning talk so it's going to be very fast. I'd like to thank my teammate Dimitri for all the good work he did. Also again I'd like to thank Mozilla for the support they gave. Thank you for listening and if you have any questions I will gladly answer them. So when we plan to release the next version with the new features this year 2019. Yes? Question is if a new spell supports XML and HTML checking. Well it supports plain text spell checking so it can tokenize a sentence and get all the words out and do spell checking on the different words but only on plain text. So if you're using HTML, LaTeG, whatever you as sort of the editor from which you're going to start the spell checking have to break down this data structure into separate words and feed the words individually or the sentences individually to the spell checker. We're not going to support all sorts of data formats in order to get the text out the sort of beyond the scope of the spell checking. But good question. Thank you very much and enjoy FOSDEM.