 So, hi everyone, this is Gettin' Onki and Ivan. I think that's my current technical job title, Consultant Productivity Engineer, I didn't call it our productivity. I prefer to think of myself as the JavaScript person who works from home. And this is not a talk, this is a love letter. Okay, I need to explain this thing. I was doing Indoor Maps for a Norwegian company a few years ago and then I was debugging touch interactions with Leaflet, the JavaScript library. This kind of thing when you pin Zoom to make the map larger, but if you first do a pan and then without touching you do the thumb and then you change the fingers, things go fussy. So I saw this presentation getting touchy by Patrick H. Locke and I learned everything I needed to learn about touch screens. So this thing is awesome. If you ever have to deal with touch screens in JavaScript, you need to watch this presentation period. This is my love letter to getting touchy. I just hope that this will break brains in the same way that getting touchy destroys brains for touch screens. So everything you never wanted to know about keyboard and input events focused on JavaScript implementations. So we press a key, the application gets notified. Right? No, it's not right because you press a key down then you release the key and that sends two different electrical signals to the CPU. Right? And that's how you get one key press. No, it's not like that. You hold the key down that the OS does this repeat thing and sends several key presses, then you release the key app then your application can't get notified of everything. So kind of looks like this. You get a key down event in JavaScript then you get a key press then you get the key app. If you keep on pressing the thing you will get several key presses. If you hold several keys at once you will have several key down then one key press for the actual characters you get and several key app. The key app events right away can be switched depending on how fast are your fingers, right? It usually has no effect. But if you are doing some key combination that does not generate a writable character you should be able to tell that apart because it's still getting a key press and it's doing something but it's not a printable thing. So browser implementations decided to create this new event called text input which deprecates key press by the way so you should never trust key press events. But then you can ask yourself, okay, what's the state of a system doing this event handling? I'm running a function to handle this event and what happens if I look at the state of the text area I'm looking at doing these things? I don't know, no one knew because the implementations differ were so different that some of them did have the changed value with the new key press and some of them did not have the same value. So they created this new specification of before input and input events which deprecates once again the text input and key press. So it looks like this now, okay? You press the key down, you get a before input before the text has been added to the document to the HTML document. Then you get the key press still because browsers do all the obsolete stuff still. Then you get the input event after the text has been added to it and you get the key app, same thing with several key presses. Please note that the physical key and the actual character that you type are different. Even though the S in this example is the same in the keyboard, it can create the different inputs depending on the modifier keys or whatever or other circumstances. And then we discovered that people use weird keyboard layouts because we're nerds or we don't use English. So if you have done some programming with the old way of doing a keyboard, you have scan codes, which is the physical location of the key given the rows and the layouts. So you have that information in JavaScript when you press a key. Then you have the location, which is not discovered, it's the left control or right control and oh, I copy pasted things wrong. You have key codes in different platforms because Windows decided to key codes in some different way than Linux. And then you have which, which is only implemented in the Explorer and then some transitional things from some newer specification. And then you only need to remember about using the key property of an event in JavaScript. Forget about everything else, it's not useful, it will confuse you, it will make your code buggy, forget about it, use key, that's it. And then we have that keys. And I care about them because my name has an acute, an acute diacritic on it. And I care about this working, I want my name to work. And this is problematic for some applications. So when you do this, this is called a dead key because in some systems, you don't get that key press at all. You just get the key press for the vowel in my case, but the character that it generates has the actual acute symbol on it. And this is assuming I am not doing unique code combining characters, which is a whole different hell. In some systems, however, you will get the physical key press, key down and key up for dead keys. But that once again depends on how your ex or Wayland system is configured, et cetera. Then everybody of us in Europe forgot about tech composition, right? Until auto-correct on smartphones happened. And then none of us noticed that we all are using text composition because whenever a word is underlined in a smartphone, we are composing characters. And all of a sudden, everything has to work with this. So if I'm pressing the H and then I'm pressing an auto-completion to hello, this is roughly what happens. You are pressing a physical key, which is unidentified because it's a touchscreen. You're getting an H, then you're pressing a different unidentified key. A composition update happens and then the whole thing gets added to a document, which kind of breaks all the conceptions you get about keywords and you are actually using it. It's not a weird case that you never saw and that you cannot reproduce. Everybody of us can reproduce these things right now with our phones. There are a few things that are not really intuitive about text composition in JavaScript. The word being composed is part of the document itself. This is kind of weird because it's not a finalized thing if you're thinking only on abstract terms. It still gets added to the thing. If you're composing something in a text editor and save that, it will get saved into, even if it's half-written. Selection, star, selection, and are some properties of the event in the window and those have different meanings depending on which event you're handling. If you're handling the before-input event, they will point to the word being composed. If you are handling the input event, it will care about, it will be the position of the cursor carrot only with sear width. We can see some examples of this. It's complicated because nowhere in the specs says, hey, selection, start, and end might point to an IME ongoing. And the problem, most dramatic things of behavior across platforms is, oh boy, inconsistent as heck. I mean, this is the life of ASDA script developers for the last 10 to 15 years. We care about four different browsers. We code our stuff in Firefox or Chrome. Then we try and see if there's some Mac person on the office who is usually the designer and hey, can you try this thing and see if it breaks or not. And then you run a virtual machine with an explorer and start cursing whoever made that thing. That's how it usually works and it takes time and you have to make sure that your code is running in the same way on four different platforms without any code modifications. This is the life of JavaScript developer who has to deal with keywords. You have to test all the browsers with all the different possible keywords that you have access to, all the possibilities. It's insane, absolutely insane. So I tried to do this manually for a time, didn't work, and the solution to this is you record all the events and then replay them. So I made a thing, I made something like 60 or 70 different use cases. It's a web page, you go there with a browser, perform the thing that you're told. It will record all the events, save them as a file and you can just inspect it and see what the hell is going on. I managed to generate like 1,000 event traces three or four times every time I noticed that I was not recording some important detail. I had to rerun the whole thing again, which takes like four, five, six hours. In the current set, I'm missing Safari and I'm missing Firefox on Android and iOS, but still, I got a lot of information from this. So I was going crazy, going through 1,000 event traces so you don't have to go crazy. Things to remember that are very important and that are really useful to think about, to keep in mind. The same sequence of user actions are in different results depending on the keyword. If you're swiping a word and then swiping another word, the decision of whether there's a space in between those words is the keyword, not the web browser. This is something that is not intuitive when you first approach the problem. When you have to look at a few hundred traces, it becomes obvious and apparent and how did we not think about this before? But that's the way it works. Same thing with, do I press a space twice? It gets converted to a period? How, what if I spell check, then press a period? Does the space collapse before, et cetera? These kind of interactions depend only on the keyboard and not on the browser. However, how those events, how the text input gets converted into JavaScript events and the order of those is browser dependent. Also, keyword behavior depends on what's already written on the text area that you're editing and it's browser independent. It doesn't matter. You can send hints to the keywords such as capitalization and spell check and rich text. It's up to the keyboard, to the on-screen keyboard to respect those hints or completely ignore them. The whole reason I'm doing this thing, if it's going one very specific on-screen keyboard is not respecting the hint to disable spell check. So it always spell checks so that changes the whole way it does input events. Then, when you can take that away in your head, you realize that browsers are cataloging three major groups, depending on support. And in any of those groups, the behavior is closely similar. So you don't really have to think in terms of 12 different browsers. You're just thinking three and that's fine. In fact, depending on the version of your browser, it will fall in different categories. Firefox from 68 and Chrome from version 60 will fall into different buckets. So input type is the most recent spec of how to do input events. Input type is a property of input events. It's a good idea. The spec is okay and the implementations are, let's put it mildly, not so good. This is how the spec looked like. So for every input event, you get a input type, which is what the user is supposed to want to do, okay? If I'm dragging text from another window, it will be an insert from drop, et cetera, et cetera, et cetera. And I have mixed feelings about this because if you write something that is insert composition text, that means replace current composition string. What the hell? Really, insert composition text, it replaces the current composition string. If something is replacing, don't call it insert. Place, place, okay? The theory looks like this roughly, and that's an abstract level. You start a composition event, then you do updates and you replace the thing and then when it ends, you insert from composition, right? Because that's what the specification does. Insert from composition, insert into a finalist composed string. Nobody does insert from composition, ever. Nobody respects the wording of the specification. I blame it on the specification being a bit vague, but that's how it works. So that's what we have to deal with. Browser's not doing what they're supposed to do, according to the spec. This is how it looks like when I'm looking at a trace. You can see there's the whole set of before-input composition update insert and input events. Those become insane, those come in threes. And then at the end, I am doing the composition end, and then I'm doing the insert text, which is not insert from composition. All browsers tend to behave in a similar way. Also note that when I'm inserting text, I am doing only the space. I'm not doing the whole word. So the word is added to the document in this particular browser when the composition end events happen. There's no explicit input events for the ending of the composition, which there should be. Also, all input events have this flag. Don't trust it. It doesn't work. Okay, there are cases where that flag is true after a composition end event. For example, when you're writing one of these spaces after doing spell check. So don't trust that word. You have to rely on composition, star, and composition end. Don't trust that composition, star, and composition end come in pairs. But you just told me you have to trust this thing. No, you cannot. Okay, that's the hell case. And you cannot say, hey, Van, why the hell should somebody worry about Chrome 51? Well, we don't until it turns out that some Android application is running with an old version of the Android SDK, and internally it's using a web view that corresponds to Chrome 51. And this is how it looks like in this case. There's a dangling composition end event at the bottom, with no composition start event. So you cannot trust it. Don't assume that composition for Latin script happens only in Android with an on-screen keyword. Safari does that on desktop. So if you're thinking, well, I will only deal with a user group that uses USB keywords. I will never have this problem. You will have this problem. They do that with dead keys, by the way. So dead keys are not dead keys anymore. They do start a composition event with the dead key in a space, and then the updated part is the dead key plus the vowel that gets finalized. The order of events, of course, can be inconsistent with the now-deprecated text input. Don't rely on that if you're trying to apply any strategies involving the deprecated event. This is one of the things that bugged me the most when I was trying stuff. If you're doing spell checking and the spell check affects the first letter of a word, the behavior is completely different than if the spell check does not affect the first letter of the word. In theory, you just replace the composition string and finalize the composition event, the composition event group. The reality is, you insert the text, then delete it, then insert it again. I don't know why, and this was creating race conditions in a low-kit and LibreOffice Core at some point. And it was weird because we thought that it was the same use case. After all, I'm just spell checking words, but sometimes we got this thing and sometimes we didn't get this back. So ultimately, it comes to the first letter. At some abstraction point in between the software layers, the software things that, well, since I collapsed the composition string to an empty string, I stopped the composition event and that is not the right thing to do. You can be composing a word, which is an empty word, and you're still composing. That doesn't mean that you have ended the composition event. And of course, there's a specific input type for this kind of thing, which is spell check. No browser implements it ever. So yay. This is what it looks like. So if I have a word with a typo in the first letter, which should be an L in this example, and I am pressing the spell check suggestion, it will delete the whole thing so the document has nothing and then it will insert the thing altogether. Don't assume that test composition is needed for spell check. That's what you were thinking, right? The only way to spell check if it's weird composing words, well, no. There's the health case of Windows 10. If you go to settings, to keyboard settings, and enable spell check, you can actually do spell checking with Windows native, which is the Windows, it had a name. Windows presentation foundation, WPF, dialogue, just next to your cursor character doing that thing without text composition. It will not underline words. It will replace words. If you're dealing with DOM with the document modeling JavaScript, don't assume that the tree doesn't change. Different browsers will change the document model in memory if you are typing things and how they change it is different from every browser. In particular, Intel Explorer will create a text node inside text area. If you're doing content editable, which is another way of triggering keyword input, the way that the nodes are created, it's completely arbitrary depending on the browser. I hate this. Don't assume that the strings are unicode, or valid unicode. You laugh, I didn't. In JavaScript, text is really an array of UTF-16 code units. In Windows 10, with an on-screen keyboard typing emojis, it will send UTF-16 code units to the browser. It will not send code points to the browser. So if you need to type a character that needs two UTF-16 code points because it's in a high area, Intel Explorer will actually emulate two keystrokes for this. Except if you are using Edge. If you're using Edge, you will just get one input event with two, the two there is the number of UTF-16 code units. If you're using Firefox or Chrome, you will get a random keystroke in between that, by the way, breaks all debugging tools because that's an invalid character. And then at the second keystroke, you get the actual thing. I think I can show this with the, I have it somewhere here. Here, right? So this is Firefox and Windows, it does this thing. If you look at the UTF-16, you can see that the smiley face is actually two UTF-16 code points, but it's two code units, but it's one code point. So in order to get these two, what Windows does, it sends two keystrokes, it's with those values, but those values by themselves are invalid code points, so they get replaced by the question mark inside the black thing, backspace. You keep using that word, I don't think it means what you think it means. Can anybody tell me what backspace does? Come on, pop quiz. The right answer to it, this is a strict question, by the way, the right answer is all of the above. Depending on the keyboard and depending on the browser, it will do any of those things. So, and of course, because I talked about this before, the behavior depends on the keyboard, not on the browser. The browser might have a tiny amount of thing to do, but mostly on the keyboard. And this is where you need to know the difference between a character and a grapheme and a grapheme cluster and a combining thing and a code point, and it gets messy. So, if you're doing a family with Edge, you go on the emoji keyboard and you type the family with man, woman, girl, and boy, you get this thing. You get seven input events and then if you delete, it will delete two code units. If you do the same thing, and this is just looking at the code points for that. So, what you can see here is man, seer with combining, seer with, I forgot the name of the seer with joiner. Man seer with joiner, woman seer with joiner, boy seer with joiner and girl. So, when you're pressing backspace on Edge, it's deleting only that code point. If you do the same thing for Android, it will delete the whole family. And I just noticed today that if you do this, I will show you the thing with Edge and I want the other use case, this one, right? So, this is with Edge. I have the printable control characters here, so you can see this man, woman, and girl. If you look at default, it's three people. That's fine. And this is going to be best, I think. And then if you look at the same thing with Chrome, it deletes everything. If you look at the same thing with Firefox, it deletes two code points. It deletes the last family member and a seer with joiner. So, what does backspace do? I don't have any idea. Yeah. I had to make the joke. This is not the joke, this is the joke. Okay, flags are fun in Unicode. A flag is this, one graphene cluster with one graphene, two Unicode code points, four code units, and it kind of behaves like a ligature. And when you delete it, you're deleting two code points together. The behavior is mostly regular and you usually don't see this, but a flag is actually two letters in a very specific space of the Unicode code point scheme. So, for every country, it's a two-letter code which I think corresponds to the ISO 6332 standard. And when there are two characters which correspond to a flag, the rendering engine will turn those characters into one graphene, which forms one graphene cluster. So, this is from Dora, this is actually two code points, one for A and one for D. Each of those code points uses up two UTF-16 code units. And when you delete the thing, you delete four units, two code points, a whole flag. So, yeah, what PackSpace deletes is decided by the keyword, the context, and the browser. And the same thing happens for forward-lead. And also a very, very similar happens for cursor keys that I'm not going to show, but you can guess how this is getting, right? If you are pressing left or right, that does left or right jump over so with joiners or members of family or the letters inside a flag. Who knows? You have to test this in your browser. And this is a very important border I want to make. If you are creating an API which handles text, you have to be very explicit in what you're handling. Do you handle UTF-16? Do you handle code point arrays? What the hell do you handle keystrokes? You shouldn't. Do you handle graphemes? If you are looking for the character count, what does character count mean in your context? Graphemes, characters, combining characters, sear with characters, be very explicit, write comments about it, write documentation because at some point it will come back at you and you will look crazy. However, and this is the positive thing, if you can handle families and flags, you will have no problem dealing with strange, weird cases as Eastern languages or Eastern scripts. If you can handle families and flags, I guess you will have no problem with Japanese text, Chinese text, Indonesian, Hindi, anything. If you can handle this, which you can because you can use those characters in your hardware even if you're Western European. Now, all keywords, the input that they do depend on the context. So if I'm pressing back, if I'm pressing left arrow and I go to a word, it might show me spell check suggestions and if I hit one spell check suggestion, I will get that key back. However, there is no current way that I know of to sync the content of the document with the content of the hidden HTML area that my JavaScript deals with. So the workaround is just to keep writing if you're writing. If you're deleting, you just get rid of all the content and not have this functionality. But ideally, you should be able to keep in sync the text in the document and in the HTML part. And enter. Yes. I made this joke once, I can do it twice. What does this do? And you know the answer for this, right? All of the above, that's right. Depending on the context, pressing enter will have a different meaning here. It can create a paragraph. If you are doing rich text editing in a content variable in HTML, it will do a line break if you're doing non-rich text in most browsers. If you're not focused on a editable text area, it will have a different meaning. Depending on how the paragraph is internally represented, it might have a carriage return or it might not have a carriage return depending on platform and it's a party. So you know how enter does, this is the same thing. Depends on the keyboard and then how is translate into events depends on the browser. Depending on the hints to the keyboard, the keyboard will decide to do one of those things. So if my HTML has any of those, that's a hint to the keyboard that I am using rich text or not and the keyboard will send, I want to insert a paragraph or I'm gonna insert a new line. I made the joke twice, I can make it thrice. I don't know what a word is, like for real, okay? Like, which one of this? Come on, pop quiz, who knows the answer? Anybody knows the answer to this question? Well, I will spoil you. The answer is all of the above. So don't assume anything because your keyboard will decide the definition of word for you. If you do control arrow or control delete or control backspace, it will delete a arbitrary number of things. Specifically, the problem here comes when you're dealing with spaces. If I delete a word, do I delete any spaces adjacent to that word? If I'm dealing with composing characters, do I delete those? If I'm dealing with zero width joiners, do I delete those? Who knows? Because also, if you look at the Unicode spec and the character set, there are 20 different kinds of spaces. So what kind of spaces did you break words by? Who knows? I cannot trust anything. So yeah, basically it's the same thing I was telling. Okay, who can fix all this mess? And I'm pessimistic here. I'm really pessimistic when I try to answer this question by myself. I think that these are the two only actors that can fix this mess, Google and Microsoft. Because they control the whole stack. They do have control over some hardware. They do have control over operating system, over the soft keyboard, and over the JavaScript application on top. And of course, I left in between here the web browser. But they do have control over all the aspects of the thing to be fixed so they can communicate vertically. Is there any Mozilla person on this room? Is there any Mozilla person on this room? Thus my point is make. I don't think that LibreOffice Online can fix this problem properly unless Mozilla can make some pull into the web browser scene and try and fix this thing. So ideally, this should happen with standards. Ideally, we should have some standard way of communicating to the OS what my keystrokes and my intentions are. And I should have some standard of telling the browser where it is. And I should have some standard in the browser to create the JavaScript events. But right now, every browser will do whatever they want. And every keyboard will do whatever they want. And there's no way to tell them apart, really. So things to do. Right now, the only way to make something out of this mess is to use a diff algorithm. So you take the content of your content title or your input field and you compare it back with what you had before. Trying to see if the position of the word being composed and the position of the cursor makes sense. That will have problems with repeating characters. There's no perfect way to deal with this. And then on any of those diffs, you create a batch of three different things, things that I have added, deleted and composing. If you have that, then you can apply a timeout to cover the case where you're deleting a word when you're spell checking. So it will just throttle it down, throttle down these kind of events and they will still make sense. And I would really, really love to work with browser vendors on the specifications of the input type. That should be best, I think. I have the room so I can have some wishful thinking here. I would love to see some complete equivalency between input type and some interface to L-O-W-S-D to allow KIT. I would love to be able to tell L-O-KIT, hey, I want to insert a paragraph or I want to insert a line break or I want to replace text or I want to start a composition or insert text from a composition. I would love to be able to do that because from my JavaScript point of view, I would just get rid of all the problems. I'm just telling you whatever the browser is telling me, I don't have any say. Any back that you might come with is just caused by the browser, not by my code. I would love to have the text from the document in the browser so I could do actual keyword replacements and proper spell checking driven by the on-screen keyboard itself, not having to depend on, not having to reinvent spell checking again. And I want to, you know, it's wishful thinking to work with our subenders because hey, most of the people who are not here. And, you know, if I'm wishful thinking, I want everything to be serverless and I want to not have a complete L-O-W-S-D instance running because my browser is intelligent enough to do stuff. And I would love to have some bigger packets of data being sent back and forth instead of every keystroke and re-rendering every keystroke on the client. But that would mean a lot of work. So that's why I put wishful thinking there. And that's all I had. So thanks for listening to my runs to about keyboard input. Thank you.