 So, hi. I'm Ivan again. That's my official job title, if I'm correct, but I prefer to myself as the JavaScript person who works remotely. I'm going to try and explain Unicode in five minutes or less. I hope that everybody here has a good understanding for Unicode since we are all supposed to be working at, you know, Word processor, but since this is not really the case, I think I want to establish like a low bar of common knowledge about Unicode just to make sure that everybody has at least an understanding of how this works. So, I like my name to be written correctly. This is something that I like. And my name has these four characters which are for graphene clusters. This is kind of weird. I guess that most people know about this thing called ASCII, which is a table of 256 numbers which map to 256 characters. Unicode is like that. And the important thing to remember here is the code point. A code point is a number from zero to 1.1 million. And that's what you have to think about where you're thinking in an abstract way. Then that number is represented or encoded in different ways. It's a UTF-16 or UTF-8. But if you're thinking, if you want to make a mental model, you have to work with code points. And then later when you're doing the coding, you worry about the encoding. That's one of the very important things. And that weird thing is that you can write my name in several ways with Unicode. Unicode has this thing called combining characters. It's a character that it's not alone by itself. So it combines with either the preceding one or the following one. Or there's a whole set of rules. But you can have, in my case, the acute accent or the acute diacritic, you can have it composed with the small letter A. Here you can also see that the combining acute accent has a higher code point number. So it will get encoded in a different way when you're doing bytes. Remember, don't worry about the encoding when you're looking at this thing. Worry about the code points. Then worry about how each code point gets encoded by a different algorithm. The algorithms are really simple and almost trivial. The problem with this is not knowing what you're doing. The problem is not putting comments in your code saying, I am using this very specific kind of encoding, UTF-16 little endian or UTF-8 or UTF-32, which is just packing everything in 32 integer fields. The problem is what your code is doing. Your mind should be working only in code points, not in code units, not in bytes. Now, if you're asking yourself how many combining diacritics can Unicode have on a character, I'm happy you asked me that question. So can you please go and dot link? Come on, help me here, just and write something on the text field. Like, yeah. And that number, we can set it up to 11. The 3. I said up to 11, not to 15. You're making a mess now. Okay, so you can have a lot of combining characters on any other Unicode character. There's theoretically a low limit. You can even put the same combining character twice and it might get displayed or not. You can have all kinds of things. And there's really no reason why you cannot do 15 or 20 or anything. This is also a funny way to get around Twitter limitations on characters because code points are not characters, apparently. So depending on how you count, you will get a different count. That's why it's important to think in code points and then how this is combining to grapheme. Let's go back. Yes. There are a lot of funny things with this. This is an interesting case which I like to show to you because this displays sequences, Unicode sequences. Flux, which all we all have on their phones, are actually two characters, each of them with one code point. Those are very high code points. It's like 100,000, so they need to be encoded in more bytes when you think about it. And it's weird because even though they have characters, they combine themselves as a single sequence that cannot be broken down. I'm going to do magic here. I'm going to try and select these two things. Oh. I want to select this thing here. And then I copy. And then I go to a new tab. And then I go here. And then I paste. So far so good. Control plus. Where's the plus here? Okay. Yeah, control plus. More. More. Okay. Do me a favor and hit backspace. Again? Oh, my God. Why do you have broken it? Where are the letters? They're gone. And the worst thing is that the cursor cut it is in the middle of the glyph, which is weird. I can split it if it's in the middle because it's different code. It's different characters. But if you try to move around with the cursor keys, you cannot split it again. It's a sequence. It's an unbreakable sequence. There's no way around it. It's weird. But you should know about this. That's actually one graphene cluster with two characters with two code points with four ETF-16 code units with a lot of... With a lot of ETF-8 bytes. Okay. It's kind of strange to think about this way. And these kind of sequences are funky in the sense that you can write any amount of regional indicator symbol letters. And the unicode sequence logic will group them in groups of two as long as there's a flag corresponding to them. So it's kind of weird. Also, something that I haven't told you that it's also important to remember is that not all 1.1 million code points have a character. Most of the code points are reserved for future use. And there's a whole different world of pain when you have missing characters or you don't have the specifications for all the code points that your text or your files are using. And now for the last example. If you can understand this, you have all CJK things solved in your head. Okay. This is one graphene cluster with seven characters, which is seven code points, which is a bunch of code units in ETF-16 and a bunch more bytes. There are a lot of things interesting here. You can see that the zero-width joiners have this... It's 2000D in hexadecimal that gets translated into two bytes in ETF-16, but three bytes in ETF-8, wasting in space. And you can kill family members when you delete things, which is crazy. If you understand this, it's really... It's going to make any unique code problem easier for you. Also, you have to remember that there's another concept that I haven't put here, which is the glyph. A glyph is the representation of a grapheme in different typeface. So, for example, an A is not the same in times in Gromand or Digiview Sans or Comic Sans. The A is different glyph for the same grapheme for the same character, for the same code point, for the same representation. I hope to have broken your brains a little bit with Unicode, and I hope you have a basic understanding of how it works. Thank you. Okay. Basically, we have... I'm still a wiki guy, and I try to import old stuff from other wikis, and I open it in the next load. And the next load is the presentation. I shared it with you. Yes, great. And basically, finally, at that conference, I was able to import stuff, but better saying, we started to import 150,000 pages in a completely new wiki, and the import is still working. The great part is we do fix stuff which another project cannot fix somehow. Trosten lost my slides. So, basically, Wi-Fi is cool. I was doing the slides in that room, actually, because I got... No, go to the first slide. And I was searching for a great name. So, basically, I said, that's the document foundation, Attic, but editable. Some might get the hint. Next slide. So, yeah, that's still... That's a screenshot from half an hour ago. They know what they have to fix. It's rather easy, but since over a year, they didn't fix it. So, the problem is we have pages which do not work. The registration is not activated, so you cannot change the content itself if you don't have any account. And more or less, we do want to keep the content updated, up to date. Sorry. So, next slide. So, and we get regular requests to our mailing lists, to some other mailing lists that some pages do not work. And we try to get around that with archive org, but we have a better solution. Next slide. As I said, the Attic. So, basically, we have working or non-working links. This depends, and we have, after the import is finished and provided for the general audience, some easy work. You simply have to modify the URL to access the content. So, next slide. Here, it's a screenshot, also a few minutes ago, and you see the templates are still not imported, basically, because it's a really long one. But that is the same page which was, oh, I see. It's basically the same page which was not working at the moment for the other project. Yeah. Yeah. So, for the future, we simply have waiting that the import is finishing. We have to import the files and simply remove the database to the production server, because it's all one on the test machine, and then you can still again get access to old non-functional working links. Yeah. So, basically, that was my time at the conference. Yeah. Any questions? Good. Yeah. Maybe one from me. So, first of all, this is absolutely awesome. I mean, there's really a number of, it's not only this one single wiki, there's a number of other wikis. Yeah. And you're working on all of them, one by one, or? Yeah. The plan is to import the Turkish leap of this wiki, so they can finally shut down the old server, which is unmaintained until now. And afterwards, we have static HTML wiki pages from the German community, which should get imported afterwards. And again, to get it editable, the German community is requesting it since a few years, actually. And yes, so in the end, if you know other wiki content, even if it's not any longer accessible to the internet, we might get find the content in the archive orgs, archives on the way back machine, and we can do fix, make the stuff again editable. So, if you have no other great resources, send it to me. Yes, of course. It's basically a one-time job, and we can do it with some media wiki, it's no brainer. But it doesn't matter whether it's converters or some other ways to get around that. For example, the German HTML archive of that Moin Moin wiki is converted with some reggags plus the Libre of this HTML importer and media wiki exporter, and it produces rather good results, whoever wrote it. Thank you for him. Depends on the workload of Gulen, or maybe how much sleep you need next week. We will see. Depends on the beers and not sleeping. I'm guessing most of these would be familiar to most here, but the hope is that you'll find one or two things that are new. So, I don't go anywhere without these tips, and I hope you will also keep them close to you. So, first tip is use a decent debugger. It doesn't have emojis, I'm sorry. It doesn't do animations, right? But it will make your life so much easier. Just learned the commands, and I have some very good ones. So, the first one is always, always have logging on. Reproducing stack traces, figuring out how the code works for certain cases. The ability to go back and see what you've done in the last debug session to grep the logs of your interactive debugging is just invaluable. The fact that you can actually grep function names and file names and find clues that will help you figure out a bug without running the debugger is fantastic. It will save you at least half your time. Always keep it on. Make sure you have a path that you have access to it. If you run GDB under super user, remember that the file itself is going to have been written by the super user, so you won't be able to log to the same file if you run it under your own username. You can use echo during a debug session to add hints like what you were debugging or what you were doing or anything like that, because this is going to be one continuous file and you don't want to lose track. This is how you do a breakpoint. You just do break or you do b and a function name or a file line, but you can actually add conditionals like at the bottom I'm actually calling a c function and you can call any function. You can compare integers, you can compare values, you can pretty much do anything. And with conditionals, you don't have to break every time, read the value and then decide to continue until you hit the value you're looking for. Just add the conditional, it will save you time. This one is the most amazing one. People traditionally print out values as a way of debugging. You really don't need to do that. You edit the code, you compile the code, you run the code, then you have to read the output. Doesn't make sense. Just run the debugger, put the breakpoint, which we did just in the previous slide. The gdb will tell you the number, the index, the id of the breakpoint. You use that as n here. You write command, n is the number of breakpoint, and then you can actually write any script you want in the gdb command. So here what I'm doing is I'm printing the value of a variable and then I print the stack trace, nine of them only, nine deep, and then I continue. And you run this on any point of interest, and then you actually get exactly what you would, but even better with the stack trace, had you added, you know, stdc out or printf or even logging, right? And you can put anything here and then the output, you grab it or you read it before sleeping or whatever you like to do with it. Next is this. Who knows what this does? Any show of hands? Who knows what this is? Okay, seriously. You can't raise your hand. I mean, I don't have candies, but okay, this is a breakpoint. And the R is regex. Is that a good hint? So you use this. If you know that whatever you're looking for, you have no clue which function it ends up in, right? But you do know which class is probably responsible for it. And you might even know a part of the function. Like if you have a bunch of functions that handle events or a bunch of functions that do something, you can actually write a regex that would break on all the matching functions. You can configure it however you want, but this is the most obvious one. I know it's going to hit the view shell at some point or I know it's going to hit the SW format or the SW undo or whatever it might be. But I just don't know which function it's going to be and you do this. And then you're able to read the log. Again, if you continue with commands, you can continue the breakpoints. You will be able to see where you're crossing paths. This one is a fantastic one because typically what happens is in a debug session, what you do is you add breakpoints as you go because you figure, oh, I probably need to also stop here. And then at the end, you have these breakpoints and you need to close the session and restart or whatever and you will lose them. That's a bummer really. No, you save. Save breakpoints. Give it a file name. It's going to save all the breakpoints. In fact, the disabled one will be saved as disabled, enabled ones enabled. And then you can edit them. You can add more, right, because it's going to be the commands that you know about. It's going to be B, the file name or the function name, et cetera, and the condition if there was one. And you can edit it. You can add more and then you source it the next time you restart GDB and you just continue where you left off. Okay. Who knows what this does? Show of hands again. Anyone? Anyone? Yes, yes. Yes, T-U-I. So T-U-I actually will split the screen to you. One, the first one, is going to give you one screen above and the bottom is going to be your command. And the top is going to be the source code. In fact, the up and down arrows will scroll the text in the code, right? So you can actually, you don't need to do L, L, L, L. No, you just scroll. And if you want, you can do control X2 and that will split it even two ways and then you get the assembly. And then you can actually go through them and you can maximize anyone you want. And at the bottom, you still have your command. But now to go to the previous and next commands, you need to use control and then control P because the arrows scroll your code. Control XA will toggle this T-U-I on and off. So you can go back and forth. Okay. LLDB is the LLVM version of GDB. Essentially, it's their debugger. And you want to use this because it is so much faster. It will attach to LibreOffice probably two times faster depending on the weather. And it has a few commands that are different, but it's worth using it if you really need to do a quick one, right? GDB is significantly slower than this guy. And that's it. I think I have a few seconds for questions or for silence. All right. Thank you. Oh, hold it. Yes. Go. Send me the link. I'll copy paste. I don't. I don't have it anywhere. I didn't know it was useful to anyone else. I don't know. Yes, Michael. Yes, I do. I have the logging and I have a few macros where one command will give me the frame and local variable printout. Yes. I can share it. Thank you. So please, someone take a picture for my Facebook. Someone, not all. So I'd like to talk about how to find bad synonyms in translation, Japanese translation. I'm Koji Andora. This is a gender about me and the characters and PO5 is to graph database and graph view and what's next. About me, I'm from south of Japan, Fukuoka. Do you know Kyushu Island? No? Oh, thank you. And so on the most famous pork taste ramen noodle. About me, I am one of the founders of Neo4j user group Tokyo and a member of the Japanese team. And I like coffee. So I like some certificate. So I booked my air. So I received next day this mail or your talk was not accepted. Wow. Oh my God. Yeah. But I love little fish. So I hope to go on. I hope to join here. Japanese, four types of characters in Japanese. This is example. In Japan, my smartphone is iPhone. My smartphone is an iPhone. Four types of characters. A Chinese character is Kanji, Watashi is Kanji. Is hiragana, is no, and this. And Katakana, smart smartphone and English alphabet. So many complicated language. So what changed with the times? Computer and old one, then to current, now we use a computer, this one, I think, before 2008, this one. So, and graph database is all Japanese are correct. So I want to find out synonyms in Japanese translation of live office. How? I hope to find, I hope to find something by graph database, I think. So this is, I think, I thought of the pattern. This is word. Yellow is English word. One English word is one Japanese word. Sentence is green. Sentence uses word. This Japanese sentence uses word. The square pattern is correct, I think. So I watched the inside of my PO file, like this one, English document, Japanese document. This is word. This is sentence, document background. This is sentence, sentence and word. So I PO file to CSV by my program, PO2 CSV Python, and into a graph database, like this one. This is a word and sentence graph in the PO file. This is English document, document word. These green sentences, sentences uses the word document. Red one is Japanese document. So you will see some strange graph pattern. This is connected. This sentence, English sentence connected to a Japanese sentence. But these are not connected. Maybe these area is a good translation, I think. So I search for sentences that do not use the same word. So I search the Japanese translation of the word document. I found three different Japanese translations. This is Japanese. This is Katakana. This is document. Three different Japanese translations. One, two, three. Third is no document here in Japanese. So for make good translations, I check synonyms and make good translations. If you unify to Buncho, Japanese document to Buncho, unify. All document to Buncho is not good, I think. If you unify to document, all Buncho to document, I think it's so good. So Japanese teams think about a select word in these Buncho or document or something. And what's next? I'm really interested in other countries' language. So Chinese or Korean or someone, you get more same trouble, more same trouble. So you will add this area. How about Germany? Translation is no problem. So I don't know. This is the last slide. So please vote for my CFP at Livocon 2020. Yeah, please. Please vote, member. And please ask me. This is my name. It's called g at annola.com. Please email me. Thank you. Who does remember what the title of my talk was? The one that I cancelled. Reproducible builds. Right. So reproducible builds in LibreOffice. We didn't do them because we always said, that's too hard. Because, well, we are depending, like if we look at all the Linux packages and stuff, we are all up here. And there are bazillion packages below us that are all not reproducible. So we can't possibly do reproducible builds. Well, then that looks Spanish. That still looks Spanish. But this looks English. And among other things, it says that Debian is by now 91% reproducible buildable. That's awesome, Debian. Can we give a round of applause for Debian? The inconvenient truth about this is, we don't have our excuse anymore. We might actually start thinking about doing LibreOffice to be a reproducible buildable software. And actually, you can see here in the next sentence, right after the stuff that I marked, why this might be a good idea. All right. So anyone volunteering to help me maybe looking into this? It's giving you moral support, you know? Yes. I will do that for you. So I sign up for moral support then. Also, I will volunteer as a visionary for this project. Okay. Any questions? So no real questions just cheering you on. I mean, I think it's really, I mean, this is one of the things like this missing gap from the open source promise that really don't know what's in the binaries unless it's reproducible. And it looks like not much is missing now. I mean, all the heavy lifting has been done. Yeah. And then we can also build on a Google cloud that we don't trust, on an Amazon cloud that we don't trust, and on a Microsoft cloud that we don't trust. And in the end, might be somewhat still trusting the end result, which might be nice. So the question is, how does that work with all the stuff that release engineering does and certification and so on? I think we should worry about that once we have a reproducible build. Since we have a lot of time, I will walk over to you very, very slowly. It's not that we absolutely need to spend all the time. We can also have the closing session about earlier. So at Debian, we had the same problem with our kernel because the kernel is signed for a secure boot. But of course, the signing infrastructure is not in the hand of developers or the billboards and so on. So what happens is that the package that's actually uploaded is lacking the signing information. And then it's stamped on by the signing mechanism infrastructure. So the final package is not reproducible in that sense, but you can still remove the stamp, the secure boot part, and you can verify the reproducibility of your package. So I guess that would be something similar that you would need to do. Yeah, also as a starting point, I mean, how do you start? I don't know if you know this image or this meme where the developer says, oh yeah, but it works on my machine. I think you all have heard this excuse before. And the solution to this is a software called Docker, which is shipping your machine. And that would be a starting point. So if we build LibreOffice inside a Docker image and actually get that reproducible, that would be a start, then we can move on from there to getting it reproducible outside and so on. I think it might be easier also to start with Linux and not start with Windows, although Windows probably is a juicy target in the end. Does anybody know what's the story on Windows? So my hunch would be there's nothing and you start from scratch. Or is there anybody doing any work there? So my hunch would be you start from scratch with all third libraries and have you work up the stack. So who's heard of 3D plastic printers? I guess you all have by now. But either way, there's a great open-sourced project. And we fixed the virtual reality world, so why didn't we fix the real world as well? So it's all made with open source. So RedWrap prints itself, which is fantastic. You don't need anything. You can just print one. But you need the first one. So how do you make the first one to print the second one that prints the third one and so on? And how do you control it? It's pretty nasty. So here's how you print the first one. You get some bits of hardware of this kind and you assemble them in a sort of fairly rigid frame like this with bits of random electronics. The electronics is easy. You have to buy that. Sadly, thus far it doesn't print it. And you have this Z axis that goes up and down. And you have some X and Y axes. And you move around a little nozzle that pushes plastic through itself, gets hot. A bit of PIR control for good stuff like this. And you use these lovely timing belts here, which I've made for your car. If the timing goes wrong in your car, the piston comes up and it hits the valve going down. And you have this amazing noise. And you don't have a car anymore. And so timing belts are very good for not stretching. So they've got this nice steel wire inside. They're very precise. And so it's pretty good. It's great for driving nozzles. But gears are difficult, particularly when you're making them out of, well, nothing very much. It turns out the ideal thing here is to get your sweetheart's chopping board and to cut a bit off the end of it with a saw like this. And your printer is nice and accurate. And it even is the same in both dimensions, like X and Y. It doesn't scale stuff crazily. So you can print your gear out, and you can make them like this that go on your steppers with some expert help, obviously. This was a while ago. And then you can print your beer bottle opener, if it pleases you, or a coat hanger, a very bad one. Notice the stringing. This is what happens if you carry on printing and plastic keeps coming out as you move, which is not ideal. So then, having made the very bad wooden plastic printer, you can make the slightly less bad plastic printer. Of course, you'll notice there's a lot of metal in it. And there's a lot of cheating going on. The self printing is all very well, but you have to tighten all these bolts up and stuff, which is a bit of a pain. So you print this thing, and there's lots of little fixings and little bits of ABS, and there's hideous warping problems, because as the plastic cools, it moves up, and the whole thing tends to go pop off whatever you're printing it on and so on. But eventually, you can end up with something like that. And you can get rid of the rather bad physical, mechanical design issues in the first design. And you can actually make something that really reasonably prints PLA bioplastic. So if you're recycling phonetics out there, this is the good plastic. It's made out of massively industrialized sweet corn production in America, that's all done with, God knows what, fertilizers and so on. But anyway, it's biological, so it must be good. And in theory, it biodegrades, although I've never noticed it biodegrade, it seems incredibly tough. And you get lots of better quality as well. So you can print gears and things that work well plausibly, which is fun. And then once you've printed that second one, that's not terribly wonderful, then you can print another one, which is significantly better designed. So the nice thing about software, of course, is you can upgrade it and you can cost engineer and engineer your 3D printer. So there is much less of it there, because stuff is expensive. So get the stuff out of the printer. And it's just quite encouraging to see the generations of, here was a very complicated block in the previous generation. So if you look at this, this guy here keeps the x-axis from going in the wrong direction. It lines it up. It's all part of the calibration. But here we've got two blocks. We've got two nuts, two bolts, four washers. It's a painter print and a paint to assemble. And you then need to align it with these nuts on the left and right. So it's just lined up. Turns out in this guy, you can do the same thing with a tiny strap of PLA that does exactly as well and is instant to print, which is kind of cool. Similarly, the electronics, you know, this is the first generation wrap, something or other from some wonderful people, make a bot, I think, made these back in the day. Complicated. Strew to control as wires everywhere, God knows what. And now with a simple Arduino and a shield, what they call a shield, this thing pops on the top, you can control, well, all sorts of things, much, much more elegant and attractive. Although a bit annoying, you can only control four things. So yes. So the reason I gave this talk some years ago was that I was encouraging people to write less lame software and not in Java. So I mean, I don't know. That's my ulterior evil motive. But unfortunately, a lot of the software is written in Java and I'm sure Java performs really well and is a wonderful cross-platform language. But it's written by a mechanical engineer. And you can imagine. Actually, computer software engineers tend to write terrible software as well. Well, computer scientists, when you see academic computer science code, they've never tried engineering or maintaining anything. It's all just awful. But apart from Donald Knooth, let's say. Okay. There's the Knooth exception, perhaps. But either way, there's some fun problems in the software as well. So you basically end up with an STL file, which is, there's no real standardization of any goodness in this world that I could see. So this is just a whole lot of triangles, random lists of triangles in random orders, which you then assemble. And then you try and slice it. You chop it in half, see what intersects and work out where to draw little lines of plastic inside, which is a fun problem in itself. Particularly since as you do it, there's all sorts of thermal effects and shrinking. And it's nice to do the outside before you color the middle in or you get leakage. Anyway. And then you get the G code, which is another disaster area from the mechanical control industry, which is like a multidimensional axis control language with interpolation and stuff. And you send that down a serial port and out pops something that's the wrong shape quite a lot. But it's good fun. So yeah. So software lovers are quite, if anyone's interested, do tell me. This was the great white hope 10 years ago. But it turns out that the insoluble problem for printing metal is that if you have a molten metal and you run it through a very small nozzle, which is what you need to make a wire, it melts the nozzle. Even though the nozzle is like titanium coated and has in theory a very high melting point, just squeezing at pressure, you know, low temperature alloys like this tin bismuth indium through it is enough to dissolve your nozzle, which kind of sucks. And so conductivity is, you know, we want something to print the electronics to make, you know, robots and things self-assemble themselves. You know that? Well, I don't think we want that. But anyway, it seems like a, it seems like one of those stupid ideas that no one would ever do, but it's kind of an interesting technical challenge. So we should do it. And yeah, that never really happened really. So they have all these silver conductive inks. So if you're a millionaire, you could probably use gold conductive inks and they might work well. But it's kind of hard to print bulk electronics. So that was my talk. Hopefully it was interesting and filled a little gap. Thank you.