 Okay. Yeah, I'm going to show you K-Public Transport, which is also somewhat indirectly an outcome of the privacy goal. And yeah, this kind of started shortly after last year's academy. So last year I presented KDI Tenderary, which is kind of the privacy-aware and privacy-protected variant of things like TripIt or the Google itinerary integration things. So what we had last year was an infrastructure to extract your personal booking data from incoming emails and add them to a timeline. And we had integration with augmentation of that with static data from VikiData, so we had locations of airports and that kind of stuff. And the thing that was missing last year was access to real-time transport data, so delays and gate changes and that kind of stuff. And I was somewhat skeptical last year if we could get that because that is usually data that needs to come from the operators. Because there's usually no other way to get the life positions of trains and that kind of level of detail of data you would need, for example, for delay displays. However, it turns out I was entirely wrong. After the talk last year I got contacted by a bunch of people actually working on this and working on this in a free software and open data community. So I got to learn a whole new world on things that I had no idea existed. And some of the key elements is the GTFS data format. That is again like the itinerary data, something that Google came up with and then basically mandated the public transport operators to provide data in that format if they wanted to show up on Google Maps. And that is a very good way to pressure them and to motivate them. So this is actually available for many operators very often under open data licenses and in some countries apparently legally mandated to be present under open data licenses. And that exists in two variants. The simple one is just basically the static schedule data, so how a train is planned to go. And in basically all countries apart from Switzerland and Japan there is the real-time version that shows the actual state which differs there. And that is a bit more complicated due to the amount of data you need to process. But that goes down to actually the GPS positions of all the vehicles in the fleet of that operator. And then to use that kind of data there are several free software projects that consume that, aggregate it and then allow you to run queries on top of that. Like how do I get from A to B or what is the delay of that specific train. And most notably that's Navizia, that's the people that contacted us, and OpenTrip Planner. Right, so as I said this is basically a free software server side thing where you feed in all the GTFS data you have. And then that does all the like the routing algorithms on how to get from A to B. And if there's a gap in between how can you walk and do you want to take your bike to the first step and like all those possible combinations you know from from the the routing apps. And it gives you access to the schedules both for the for lines and for stops. And it has the location search as you type the name of a stop right that you know to complete that. So basically all the backend stuff you expect behind the routing application. And it of course considers disruptions and delays and shows that to you or adjusts the routing accordingly. And yeah you could set that up on our own infrastructure and then just feed in all the GTFS data we find. Or they also have a version they host that we are allowed to use. And they have hundreds of feeds from all over the world in there. So that's their coverage map. You see there's a strong bias towards Europe and the US. And unfortunately very little if anything at all in Asia. So that's not ideal but that's a huge amount of data we can already work with. So how do we get to that? And that's where K-Public Transport comes in. So that's the framework that allows us to interact with that kind of data query it and then represent it in a way that we can integrate that in an application. It's a simple job based query API currently supporting location searches departures from or arrivals from a given station and journeys. So how do I get from A to B? It's not limited to Naritsia so it supports multiple backends. We'll see a bit about that later. And it can pick the right backend based on the locations where you are traveling. So to get the best results and to reduce the amount of data we have to send out. And that also implies we need to be able to merge results we get from different backends and that is the that's probably where most of the code in that framework is actually needed for. Yeah regarding the backends there is obviously the Naritsia one which is providing the widest coverage at the moment. Then we have support for three proprietary systems. Some of them are more or less documented. Some of them require a bit of creativity. That is mainly necessary for some of the German operators to get some data there. What we are still missing is a backend for the open trip planner system which has some similarities to Naritsia but it has a bit of more complicated interface. That would be necessary to support Norway and Finland at least so there are national railway ones on top of that. And yeah the challenges resulting from supporting multiple backends is the merging the results and aligning different spellings or variations of the same location in a way that that it's the same. This is the coverage areas that we have beyond Naritsia so that's the mainly the proprietary backends. Again you see it's a strongly biased towards Europe and one exception in New South Wales I think but unfortunately still a big gap in Asia and America isn't even on the map so there's still quite some work necessary to make that a bit more global. Well Greenland is partly covered not because we actually cover it but because the the simpler you make the geometry for the coverage areas the the faster it is to actually make hit detection on those areas work to pick the right backend so that's why it's these weird simplified shapes and then map projection of course that's why it's spreading out in the north. Yeah a bit more on the result aggregation. There is I think the biggest problem there is unlike with air traffic there is no universal identifier for locations. So every backend basically has their own arbitrary numbering scheme and then you have the human readable names which might be in different languages which might use abbreviations for parts like station or central station that is typically abbreviated somehow in in the different languages so we need to try to somehow normalize all of that and merge it together. We quite often get geo coordinates so that's that's useful but it's unfortunately not enough because if you look at one of the very large central stations they are many hundred meters in in size on in that area you can actually have multiple bus stops so I mean geo coordinates help to clearly distinguish stuff that is not next to each other but how exactly do you group this together in a in a local area then another fun problem is with with very local providers they tend to not add their location name or city name to the name of their stops so if you search for airport you find stops named airport in 15 different cities and in the context of that city the stop name airport of course makes sense but if I try to find that without having any additional information that isn't really helpful and then of course there's problems with the identification of a specific train or a specific bus as well so they tend to have like a line name like m5 here but sometimes they also have individual numbers for the individual trips on that line and it's not always clear which one you get and that also again makes it harder to properly merge that and spelling variants and languages I already mentioned that so none of that is unsolvable but it requires a lot of feedback from people using that especially initially on weird results that they are getting right and then because that this is very local and locale dependent so I need specific test cases from in this city searching for that gives me nonsense results or it it merges this or it doesn't merge that and then we can incrementally improve that privacy that's of course something we need to look at as that's where the whole thing started this will require online access because we we depend on real-time data right so there is simply no way around that there is also very limited room for for caching the results that might work for location searches but not for anything else because we want to see delays as they occur so there is actually network traffic on the outside that is observable but at least we can control what exactly we send in those requests the proprietary apps tend to add various unique identifiers or cookies and whatnot we can strip all of that right and just put in what is the absolute minimum that is necessary and I think that's as good as it gets right the the next step is just to disable that entirely if you're concerned about that network traffic there is unfortunately a few proprietary backends that don't even support transcode security so there your requests go out entirely unencrypted those are off by default but if you're living in austria kd itinerary has a specific setting to enable that because otherwise you don't get any data so that's a trade-off you need to do and another problem we still need to solve is if you search for a location by name where we have no context on which backend to pick it currently picks all of them and there might be backends in there from countries that or from providers that you don't trust right so we want some form of manual selection I know I'm in Italy so there's no point in sending it to the UK or wherever so that is something we we still need to support then another topic that isn't all that obvious since we are dealing with open data just like with free software we have to look at license compliance and attribution so the data that is based on sometimes for example is licensed under creative comments attribution license which means if we show the data we need to properly attribute this now since the data comes from various sources that USD application author might not not even know the framework collects all that data aggregates it together with the results and gives you as the application author all the information you actually need to show and you can write any about dialogue or you show it in line next to the results and then that hopefully makes it easy to comply with the the licenses and and conditions around that what else right so what currently supported is the absolute bare minimum on data that we needed for kd itinerary but what we get from the backends is actually a lot more the typical result size for a journey query is easily more than a hundred kilobytes in in jason and what we currently extract is probably less than a kilobyte of that so there's things like the entire gps track of the route you are going there is tons of options on how you want to do the routing so how fast do you walk do you have a e-scooter with you or do you want to avoid specific terms modes of transportation and all of that kind of stuff there is a lot more in disruption informations there is a lot of accessibility information which actually could be quite interesting to show that in in the application or to consider that as a as a constraint during scheduling right so during querying if you if you are dependent on lifts or something like that right so it's good to know where they work and where they don't work there is information on how full the trains are probably or are expected to be there's the whole area of pricing and ticketing which we haven't touched at all yet and stuff like the estimated co2 usage on that trip in case you want to optimize for that so there's a huge road that that we haven't even started with yet right and then the people around for a bit longer might vaguely remember that we had something like that in the past and yes there is the public transport plasmoid from the early kd4 times that was unfortunately discontinued and it wasn't it didn't seem useful to revive that instead of building something new because that was initially based on web scraping because that was the only thing available at the time so this is like more than 10 years ago right before smartphones nowadays most of the network operators have their own apps and that kind of forces them to have a much more sensible api that we then can hook onto and gtfs was also just starting at the time so the the bill of code that existed for that was offline processing so you basically you download the entire multi hundred megabyte gtf file the static one and do local routing that however isn't really feasible on a mobile phone let alone if you want to have the real-time data because that's a multi megabyte per minute protobuf stream from which you need one tiny detail but you need to basically have the continuous stream to update the local state so this isn't really feasible on on mobile so there's unfortunately very little that can be or that could be reused from this just because the environment has changed quite a bit since then yeah and then I have a few pictures on how this is actually integrated so this is taken out of the detailed screen in KDE itinerary you can see the the delays show been showing there and you can see the platform changes so it changed on top and it got confirmed in the in the lower case so it's shown in green so that was the first bit we integrated and then the more advanced stuff this is the alternative connection selector so if you missed your connection you can basically look up the the next one on that trip the red one is cancelled so we also detect cancellation status and and some some notes we get from the operator and then you can save that and the itinerary timeline updates to that that new trip while keeping your ticket but dropping your seat reservation that doesn't make sense anymore and we have departure schedule so if you if you arrive at the airport and want to check if you have to rush to get to the train or if you still have time for a coffee right that can be useful I think this is mainly a stop gap until we have proper navigation from say from the hotel or from getting from the airport to the hotel built into itinerary so that's one of the the next steps coming up but until then that is kind of the easiest to implement approximation and that is the the integration as we have it in itinerary so far but then Nico started a new app k-trip which is basically focused on journey planning which is currently I think the second user of the k-public transport framework which I think more directly exposes what k-public transport actually can do yeah so what this what this hopefully showed is that there actually is quite some useful and freely available data around public transportation available to us and we now have the infrastructure to easily get to that and yeah depending on what we actually want to do with it we now can step by step extend the data model and extend the the query API to support that since there is so much data I'm trying to do that on an as needed basis so whenever there is a specific thing we we want to address usually there's the corresponding data available and then of course there is a lot of stuff that we could do with that data beyond what we have done so far there's of course the integration with my craft as an obvious idea right so you can ask my craft for when when does my train leave or does my train have a as a delay or how do I get to the university and that kind of stuff and something I would like to see is the kind of the commuter counterpart to itinerary so for your regular daily trips showing you the delays or showing you alternative means to get to to work or to get back home there is of course also a nice building block for and I mean bigger picture all of this is basically building blocks for a larger digital assistance system that we hopefully one day will have on on plasma mobile yeah and that's it that was amazingly on time just like public transport ought to be questions um some providers like in the Netherlands they do have fairly extensive albeit proprietary APIs but they are gated behind API tokens how do you deal with that or at all we have that problem already some of the backends we use require API tokens they are checked into the source code that seems to be the standard everybody is doing it I was very reluctant to do that and I talked to some of those people and they said everybody else is just publicly checking in those as well so so that's what we do I mean for most of them you get the API tokens for free so there is very little incentive in stealing them right I mean you need to sign up for them but they are essentially free so they they are sometimes rate limited but those that I've seen have rate limits that are way above being problematic anyway and for now it's here we talk to them and they actually um they give free software very favorably rate non-rate limited or well API tokens is very high rate limit so that's not a problem for those sources which either don't have an API or have an API that we cannot use for whatever reason is web web scrapping an option I said you mentioned was scrapping I don't know if yeah I mean in theory you could implement a backend that works via web scraping I'm just a bit too lazy to do that but it might indeed be that there is providers where we have absolutely no other option um and yeah I mean the backend API is basically just give me the results how to get from A to B or something like that right and how you do that is up to the backend so web scraping would be would be ugly but it would be a technical option yeah do you think you need an infrastructure some server in order to provide the answer that a mobile phone will require um yeah I mean this this needs server infrastructure but luckily not our server infrastructure so we use the server infrastructure from narizia and from the commercial backends so in theory we could host the narizia software ourselves and maintain all the data in there ourselves but that would there would be very little gain to that apart from a lot of work so better share that right if you do that are you not sharing the information your personal information with navizia um well you of course they see the same they see the queries you sent yes so um they see that somebody is querying how to get from A to B but there's no identifying token beyond the IP address that would connect us to you specifically right um it's connected to this is coming from the KDE library due to the API tokens um but yeah I mean if we would host that ourselves you would still share this with KDE um maybe you trust KDE more than you trust the other guys um but essentially you're still sharing this right um self hosting that for each individual person I don't think that scales because the um the gtfs real-time protocol is quite heavy that might produce multiple megabytes per minute updates so if suddenly many thousand people draw the data from those backends I think we cause a problem there right so this stuff is designed to be used with very very few servers that consume it and then share it um I mean yeah there is a trade-off we need to do in in regard to privacy uh because it it isn't doable without online access so anyone else otherwise let us thank Falkar and be on time