 To start, we have a Falker-Corrosse here with a Privacy By Design Travel Assistant. It's going to be about building open source travel assistants, I think. And this talk will be in English. And if you want translations, wenn ihr eine deutsche Übersetzung haben wollt, haben wir noch ganz tolle Übersetzer in unserer Kabine. Da könnt ihr auf c3lingo.org mal reinhören, wie die alles live mitreden. Genau. Jetzt haben wir einen schönen Willen für Falker hier und haben Spaß mit dem Gespräch. Danke. Okay, so was ist das für Sie? Ihr wisst wahrscheinlich, dass die Features in... in... ...most prominently Google Mail, aber ich denke Trippett war der, der das pioneerte. So Gmail reads your email and then detects any kind of booking information in there like your boarding passes, your train tickets, your hotel bookings and so on. And it can integrate that into your calendar and it can present your unified itinerary for your entire trip and monitor that for changes. And all of that doesn't cost you anything. Maybe a part from a bit of your privacy. Well, not too bad, you might think. But if we look at what kind of data is actually involved in just your travel, right? The obvious things that come to mind, your name, your birthday, your credit card number, your passport number, that kind of information, right? But that isn't even the diverse part on this. Because those operators don't just get to see your specific data for one trip, right? They get to see every one's trip. And now if you combine that information, that actually uncovers a lot of information about relations between people, your interests, who you work for, where you live and all of that, right? So pretty much everyone here traveled to Leipzig for the last four days in a year, right? If that happens for two of us once, that might be coincidence. If that happens two or three years in a row, that is some kind of information. But yeah, what to do about that, right? The easy solution is just not use those services. It's like first world luxury stuff anyway. That works until you end up in a foreign country where you don't speak any of the local languages and then get introduced to their counterpart of China Satzverkehr or Tarifzohnrandgebiet. And at that point you might be interested in actually understanding what's happening on your trip in some form that you actually understand and that you are familiar with. Ideally without installing 15 different render applications for wherever you actually might be traveling, right? So we need something better. And that obviously leads us to let's do it ourselves. Then we can at least design this for privacy right from the start, build it on top of free software and open data. Well, of course we need to, at least it's not entirely obvious that this will actually work, right? The Google and Apple, they have a total different amount of resources available for this. So can we actually build this ourselves? So let's have a look at what those services actually need to function. And it turns out it's primarily about data, not so much about code. There are some difficult parts in terms of code involved as well, like the image processing and the PDF to detect the barcode in your boarding pass. But all of that exists as ready-made building blocks. So we basically just need to put this nicely together. So let's look at the data. That's the more interesting part. In general, that breaks down to three different categories. The first one is what I call personal data here. So that's basically booking information, documents or tickets, boarding passes specific for you. So there at least you don't have a problem with access because that is sent to you and you need to have access to that. But it comes in all kinds of forms and shapes. There are the challenges to actually extract that. The second kind of data is what I would call static data. So for example the location of an airport. Now you could argue that that could change and there is rumors that some people apparently managed to build new airports. I live in Berlin, so I don't believe this. Jokes aside, so static refers to static within the release cycle of the software. So several weeks or a few months. So this is stuff that we can ship as offline databases. Offline of course helps us with privacy because then you're not observable from the outside. And the third category is dynamic data. So stuff that is very, very short lived such as delay information. There is no way we can do that offline, right? If we want that kind of information, we will always need some kind of online querying. Then let's look through those three categories in a bit more detail. For the booking data, Google was faced with the same problem. So they used their monopoly and defined the standard in which operators should ideally have machine-readable annotations on their booking information. And that's awesome because we can just use the same system. That's what nowadays became schema.org, which I think Lukas mentioned in the morning as well. At least in the US and Europe, you find that in about 30-50% of booking emails you get from hotels, airlines, or event brokers. So that's a good start. But then there's the rest, which is basically unstructured data, random PDF-Files or HTML-Emails. We have to work with. There's Apple Wallet Boarding Passes. They are somewhat semi-structured and most widespread for flight tickets. Well, that's somewhat usable. And then barcodes. So that's what you, again, see on boarding passes or train tickets. I could probably fill an entire talk just with the various details on the different barcode systems. The one for boarding passes, I think, Karsten Null had a talk at Congress a few years back where he showed how they work and what you can do with them. Instagram hashtag boarding pass is a very nice source of test data. The one that you find on German railway tickets is also pretty much researched already. The ones we actually had to break ourselves were the one for Italy, I think, to my knowledge, we are the first ones who published the content of those binary barcodes. And we are currently working on the Faude-Faude-Kern-Applikation E-Ticket, which is the standard for German local transportation tickets. That actually has some crypto that you need to get around to actually see the content. So, if you're interested in that kind of stuff, there is quite some interesting detail to be found in this. But let's continue with the static data. There, of course, we have liquid data that has almost everything we need. We are making heavy use of that, and that's also why I'm here today on the Wikimedia stage. One thing that Wikidata doesn't do perfectly is time zone information. That's why we are using the OpenStreetMap data for this. There's in Wikidata three different time zones or ways of specifying the time zone. UTC Offsets, some kind of course human-readable naming like Central European Summertime, and then the actual IANA time zone specifications like Europe slash Berlin. And that's the one we actually need because they contain day-like-saving time transitions. And that is actually crucial for travel assistance because you can have a flight from, say, the US to Europe at the night where there is a day-like-saving-time transition on one end. And if we get that wrong, right, we are off by one hour, and that could mean you miss your flight. So that we need to get absolutely right. And Wikidata there mixes the three time zone variations. That's why we fall back to OpenStreetMap there. Another area that still needs work is vendor-specific station-identifiers. So there's a number of train companies that have their own numeric identifier or alphanumeric identifiers, which you find, for example, in barcodes of tickets. So that's our way to actually find out where people are traveling. So that's something we are trying to feed into Wikidata as we get our hands on those identifiers. For airports, that's easy because they are internationally standardized. For train stations, that's a bit more messy. And finally, the dynamic data. That's, again, an area where we benefit from Google using their monopoly. They wanted to have local public transportation information in Google Maps. So they defined the GTFS format, which is a way for local transport operators to send their schedules to Google. But most of the time, that is done in a way that they basically publish this as open data. In that way, all of us get access to it. And then there's Navizia, which is a free software implementation of a routing and journey query service that consumes all of those open data schedule information and that then, in turn, we can use again to find out departure schedules, delays and that kind of life information. Apple Wallet also has some kind of life-updating polling mechanism, but that is somewhat dangerous because it leaks personal identifiable information. So basically, a unique identifier for your pass is sent out with the API request to pull an update. So that is basically just a last resort mechanism if you have nothing else. And then there's a bunch of vendor-specific, more or less proprietary APIs that we could use. They are unfortunately not often compatible with free software and open source. They might require API keys that you're not allowed to share or they have time and conditions that are simply incompatible with what we are trying to do. For some of these works, but there's still some room for improvement in those vendors understanding the value of proper open data access. Okay, so that's the theory. Let's have a look at what we have actually built for this. So there's two back-end components. So to say, there is the extraction library that implements the schema.org data model for flights, for trains, for hotels, for restaurants and for events. It can do the structured data extraction. That might sound easy at first, but it turns out that for some of the operators, doing proper JSON array encoding is somewhat hard. So, I mean, you need to have a comma in between two objects and break it around it. Some of them struggle with that, so we have to have lots of workarounds in parsing the data we receive. Then we have an unstructured extraction system. That's basically small scripts per provider or operator, that then use regular expressions or XPath queries, depending on the input, and turn that into our data model. Currently, I think, have slightly more than 50 of those. I know that Apple has about 600, so I still want all of magnitude more, but it's not impossible, right? So, I think we have the means there with free software to come to a similar result than people that have an Apple or Google Scale budget for this. The service coverage is actually quite different. For Apple, I've seen their customer extractors, so they have a lot of US car rental services. We have some more important stuff like CCC tickets, so the Congress ticket is actually recognized, and I managed to get in with the app. What the Expansion Engine also does is it augments whatever we find in the input documents by information we have from Vicky Data. So we usually have time zones, countries, geo-coordinates, all that useful stuff for then offering assistance features on top. Input formats is basically everything I mentioned. The usual stuff you get in an e-mail from a transport operator or any kind of booking document. The second piece on backend components is the public transportation library. It's basically a client API for Navizia mainly, but also for some of the proprietary widespread backends like Hafa's, that's the stuff Deutsche Bahn is using. It can aggregate the results for multiple backends. If you're using open data in a backend, it propagates the attribution information correctly. Just a few days ago, it also gained support for querying train and platform layouts or Wagenstandsanzeiger in German, so we can have all of that in the app. Of course, there is the KDE itinerary app itself. It's very hard to read here. It's basically a timeline with the various booking information you have grouped together by trip. It can insert the live weather information. Again, that's online access, so it's optional, but it's kind of useful. You probably can't read that, but that's my train to Leipzig this morning, and that's actually the congress entry ticket. The box at the top is the collapsible group for my trip to Leipzig for congress. It can show the actual tickets and barcodes, including Apple wallet passes. If you sometimes have a manual inspection at an airport where they don't scan your barcoding pass, but look at it, apparently that looks reasonable enough that you can board an aircraft visit. At least I wasn't arrested so far. And then we have one of my favorite features, also powered by Wikidata. It's the powerpluck incompatibility warning. So, I mean, if you're traveling to, say, the US or UK, you're probably aware that they have incompatible powerplucks. But there are some countries where this isn't, at least to me, isn't that obvious, like Switzerland or Italy, where only half of my powerplucks work. So this is the Italy example. It tells me that my Schuco plugs won't work, only my Euro plugs. And the other one. And the right one is, I think, for the UK, where nothing is compatible. If you occasionally forget your powerpluck converter while traveling, that is super useful. And then, of course, we have the integration with real-time data. So we can show the delay information and the platform changes. The part in the middle is the alternative connection selection for trains. So if you have a train ticket that isn't bound to a specific connection, then the app lets you pick the one you actually want to take. Or if you're missing a connection, you need to move to a different train. You can do that right in the app as well. The screenshot on the right-hand side is your overall travel statistics. So if you're interested in seeing the carbon impact of all your trips and the year-over-year changes, the app shows that to you. And I wasn't really successful, but that's largely because the old data isn't complete. So if you're interested in that, since we have all the data, that can help you see if you're actually on the right track there. And then to get data into that, we also have a plug-in for e-mail clients. This one is for K-Mail. So it basically then runs the extraction on the e-mail you're currently looking at. And it shows you a summary of what's in there. In this case, my train to Leipzig this morning. Including the option to add that to the calendar or send it to the app on the phone. We also have the browser extension. So this is the website of the yearly KDE conference, which has the schema.org annotations on it. And the browser extension recognizes that and again offers me to add that either to my calendar or to the itinerary app. And that also works on many restaurant websites or event websites. They have those annotations on the website for the Google search. So again, we benefit a bit from the Google. Okay, then we get to the more experimental stuff that basically just was finished in the last couple of days. That we haven't shown anywhere else publicly yet. The first one is, and that's a bit better to read at least. If you saw the timeline earlier, right, it had my train booking to Leipzig and then the congress ticket. But that still leaves two gaps, right? I need to get from home to the station in Berlin. And I need to get from the station in Leipzig to congress. And what we have now is a way for the app to automatically recognize those gaps and fill them with suggestions on what kind of local transport you could take. So here the one for Leipzig to congress is expanded and shows the tram. That still needs some work to do live tracking so that it accounts for delays and changes your alarm clock in the morning if there's delays on that trip. But we have all the building blocks to make the whole thing much more smart in this area now. And that I think was literally done yesterday. So, that's why the graphics still are very basic. That's the train layout, coach layout display for your trip. So that you know where your reserved seat on the train can actually be found. Then I only showed the K-Mail plug-in so far. We also have a work-in-progress Thunderbird integration which is probably the much more widespread e-mail client. Feature-wise more or less the same I showed for K-Mail. So it scans the e-mail and displays your summary and offers you to put that into the app or possibly later on also into the calendar. This one is even more experimental. I can only show you a screenshot of Weapon Spectre approving that it managed to extract something. That's the integration with NextCloud. I hope we'll have an actual working prototype for this in January then. Those two things are of course important for you to even get to the data, the booking data, than the app or other tools you build on top can consume. Okay, so where to get this from? There's the Wiki link up there. The app is currently not yet in the Play Store or in the asteroid master repository. We have an asteroid nightly built repository. I hope that within the next month we'll get actual official releases in the easier to reach stores than what we have right now. If you're interested in helping with that, there's some stuff in Wikidata where improvement on the data directly benefits this work and that is specifically around train stations. I think in Germany last time I checked we still had a few hundred train stations that didn't have geo-coordinates or even a human readable label. So there is something to look at when your specific or even the more or less standard train station identifiers is something to look at. So UIC or IDNR codes for train stations, that helps a lot. Yeah, and then we kind of need test data for the extraction so forget everything I said about privacy. If you have any kind of booking documents or emails you want to donate to support this and get the providers you're using supported in the extraction engine talk to me. That would be extremely useful. Yeah, that's it. Thank you. Hello, hello. That's a very impressive project. I think, do we have questions then I'll hand you my microphone. Yes. Would it be possible to extract platform lift data for train stations? Sorry, platform? Platform lift data. Oh. I think Deutsche Bahn has an open data API for the life status of lifts. That would of course in theory be possible. What we are trying to do is to be generic enough so that this might not be applicable in just one country. Although it is very European focused because most of the team is there. But lift is something that is easy enough to generalize in a data model. It's location on the platform and are they working or not. So yeah, that would be a nice addition. That goes into the entire direction of indoor navigation or navigation around larger train stations and airports. So, that's probably something where we could use a better overall display with the open street map data and then augment that with where exactly is your train stopping and in which coaches your seat and then have the lift data so we can basically guide you to the right place in a better way. Any more questions? Is the mobile app written in QT as well? Yes. Most of this is C++ code because that's what we use at KDE. The mobile client as well. There is a bit of Java for platform integration with Android. I don't think anyone has ever tried to build it on iOS but of course it works on Linux based mobile platforms as well. Thanks to QT and C++. So, you mostly talked about the mobile app so far which is understandable but as it's a QML application does it also run on desktop and second question how do all the plug-ins and the different instances of the app share the data? So, yes, the app runs on desktop I was trying to see if I can actually start it here not sure in which screen it will end up that's where we do most of the development. Let me see if I can move it over. Thank you. And now I need to find my mouse cursor on the two screens. I think I need to end the presentation first. But yeah, short answer of course. There we go. Let me switch to... Yeah, so that's it. Running on desktop. It has a mobile UI there that could of course be extended to be more useful on the desktop as well. In terms of storage that is currently internal to the app there is no second process accessing the actual data storage that would just unnecessarily complicated for now but if there is a use for that we'll need to see. Yeah, but there was an option in the email plug-ins for example to send it to the app can I then only send it to my local app and not to the mobile app? That's using KD Connect. That's an integration software that allows you to remote control your phone from the desktop. So that's basically bundling up all the information and sends it to the app on the phone. Or it can import it locally. Do we have other questions? Yeah, now we don't have time. So then. Thank you very much, Falka. Maybe you can tell people where they can find you if you have anything more they want to talk about. Yeah, I mean there's my email address and otherwise I'll be around all day all four days. Around where? Congress somewhere. So yeah, that is a bit tricky. Catch them before you run away then. Alright, so give a round of applause again and thank you Falka.