 Hello, my name is Allan Dart, I work for OpenSky Network and my today's talk is called capturing and sharing ATC VHF voice data toward holistic view of flight and here's the outline of today's presentation. I will start with giving a short background about the project within which the functionality is going to be developed. Second, I will introduce few of the use cases we have envisioned for the collected voice and int transcripts. Third, a quick look at the end-to-end data processing chain after that some APIs that we have developed for the data users and lastly I will introduce the annotation platform that OpenSky uses for getting the voice recordings transcribed and I will end my talk with conclusions and with some outlook for future work. The project within which the functionality is developed is an European Union funded project called ADCO 2. The project webpage is behind this link over here and from the webpage you can find more details about it. For example, a list of participants and about other activities that are going on within the project. But all in all there are two main objectives for the project first to develop and deploy an infrastructure for capturing, storing and distributing the ATC VHF voice recordings and its transcripts so wider audience can make use of the data and hopefully accelerate the innovation in the aviation domain as a consequence. This objective is the main focus point for us in OpenSky where we are responsible for developing infrastructure and also activating the community to contribute into the cause. And second objective is to develop and deploy high performing automatic speech recognition or ASR algorithms to get the labor free transcriptions of ATC voice recordings which when using a human workforce would be really expensive to get. On this slide you see the use cases for the VHF recordings and the transcripts that we have defined within the project. First up we have a data user type A who operates the VHF receiver and who feeds recordings to OpenSky. They most probably wish to access the data they share with us and it definitely must be made available for them using some sort of a simple interface. Secondly user type B are the ones who are not necessarily feeders themselves they might be but they don't have to be but are interested in what is happening in certain geographical region for example around the airport close to them. Third user type C are people who might want to follow a trajectory of a specific aircraft. It might be some sort of a state aircraft for example Air Force One comes to mind first or an aircraft that takes a family member or a friend to a vacation and the user type C then wishes to make sure that they arrive safely or maybe some professional user conducting a safety or security related investigation and last user type D a professional or a professional like user who wants to conduct some sort of a large scale data analysis to understand the dynamics of air traffic more deeply. They could be researchers from some university for example. What is common to all of those use types is that all use cases will benefit a lot if a aircraft trajectory accompanies the voice recordings and we in open sky are in good position to join those data sets into one because we have already vibrant and active community of users who feed AGSP data to our infrastructure. Let's now look at how the end to end data processing chain looks like and it is illustrated on this graph here who in principle we have three columns. We have the community column feeding us with different kinds of data. We have the open sky column that is us. We collect, store and share the data between different parties and we have the annotation platform which within the Adco2 project is handled by spoken data. It all starts with the community who feed VHF voice to open sky. We add surveillance data from AGSP feeders and other metadata to those VHF recordings and pass it on to annotation platform. The metadata containing contextual information will help to improve the quality of annotations in later stages or in the annotation stage here. And first the annotation platform makes sure that the recordings that we pass on to them are with sufficient quality meaning that the duration of these recordings is longer than needed than the threshold dictates and also the SNR or signal to nature noise ratio is above some thresholds. Then the speech to text conversion and labeling happens in the annotation platform. The most crucial thing here is to find the gold sign or label the gold sign correctly and that in later stages will help to join recordings to aircraft trajectors for example. But other information is also labeled. For example commands from air traffic controllers to pilots are labeled and tagged. For example if a air traffic controller orders a pilot to climb to some given flight level then this information is also annotated. And once this is all done the labels and transcripts are then provided back to open sky where we store those in our servers and make it available to community as we do so with ATSP data currently. Obviously the annotation process is not flawless and therefore we will give possibility for users to correct the annotations if they spot some mistakes in the transcripts then the interface through open sky will allow to correct those. And if someone is interested contributing even more and transcribing recordings from other features in larger extent they can do it also and in this case open sky is just the interface between the between the users and the spoken data and obviously one can bypass open sky in here and go directly to the annotation platform and if you wish to do so you're kindly or you kindly can visit the webpage shown here. Let's now look at some of the APIs open sky will soon provide. We have used Swagger API developer tools for that and the API documentation can be found behind this link here. The webpage provides nice and simple overview about the APIs and also provides a simple and intuitive API call generator so you can test out the calls before doing it in your application. In principle the APIs are divided into two categories we have the API's dedicated to data these include all operations related to ATC data you can see available recordings and transcripts that go with them and secondly we have API's dedicated to users and receivers so you can see the available receivers and also users and in addition there is API dedicated for for getting the all signs of aircraft that are surrounding the given receiver in given time interval. This information can be used as a contextual information in in the annotation stage as we described earlier and on this slide we have listed some of the API use cases for example you can get the recording metadata this also includes transcription history if any transcriptions were saved for this particular recording in essence you will get the audio recording key that you then can use for downloading the recording itself or get the transcript for this particular recording or as listed here you can write the transcript for this particular recording also as provided by this API. You can use the API to get list of receivers clients with their metadata and as I already mentioned get call sign of the aircraft surrounding the receiver at the given time interval. Obviously not all those APIs are needed for the users some of those are used only internally or made available if needed for example the list of clients and receivers with their metadata and that definitely is not going to be made available for the wider public use. On this slide we've shown one example how to get list of available recordings as as you saw on previous slides there are other API specified but I'm just taking this one as an example. So to test the query swagger API provides excellent possibilities I can easily and intuitively apply filters such as airport from where I wish to receive the recordings I have the country code filter field here there's also possibility for aerial filtering I can easily define a polygon where the receiver should locating and I have my temporal filters as well so from date and to date within which the recording should stay in. In this example as you can see I specified two filter fields first airport which is a Siri airport and obviously then the country code is Switzerland and here's the results that you get first you get the query syntax that is helpful for not so seasoned query builders as myself so I can use it as an example in my data analysis scripts in later stages perhaps more for whatever reason you can think of and I get the response then it's in the JSON format and I get information about recording such as IDE that I can use to retrieve the recording itself from the server I get the file format I get the VHF channel that is used and specified in the file name I get the sampling rate and pretty much any information that you can think of about the recording and that is useful for analysis in later stages one important note I must make here is that this is all still kind of work in progress and it will be made available for public use within few months from now meaning that details that I've shown here they might change but the approach remains the same and that is what I wanted to show and highlight here it will begin really easy to get VHF voice data for doing any kind of analysis you can go really deep into signal level and I don't know maybe estimate the signal quality a signal to noise ratio signal to interference plus noise ratio or you can stay in higher level with your analysis and estimate the general occupancy for example without a need to go in deep into the signal level so it enables to do all sorts of analysis really simple way finally I wanted to show you the annotation platform that we're using within the Adko2 project we partnered with reply well a small startup from Czech republics who has built a nice platform called spoken data and if you go to the web page shown here and you'll get to the Adko2 annotation service so if you're interested choose your airport of liking maybe based on your language skills and contribute with annotations this will help to improve the annotation algorithms that we use a quick side note here the list of airports shown here are the ones that we currently follow and annotate there are some more for example talent in Estonia but the recordings from there is a bit poor and not in scope for this project and over here you see the annotation UI on the left hand side there's the visual representation of the signal waveform in center is the media player you can scroll forward and backward and pause and do everything that you expect the media player to do and you also see the automatic transcript in below here that you can then correct if needed or you can say that the algorithm has done a good job on the right hand side you see all the information about the recording that you get from the open sky and also some useful instructions how to do the annotations so i really do encourage you to join and start annotating not only because we really would appreciate your help but because it is fun actually to listen to pilot and their traffic controllers conversation and follow the details that they exchange between each other and how they do that and the conclusions so open sky is collecting VHF voice data not really for public use currently but will be soon so follow news in social media and on our webpage you can contribute to open sky and you have got your project by setting up a receiver as described behind this link over here you can become an annotator as i already talked briefly about and instructions how to do that is behind this link over here and when the service will be available soon start using our data i think it's kind of exciting what we're developing here we have air surveillance data paired with VHF voice data and it creates in my mind endless opportunities so starting something simple like following flights and with voice it's going to be much more fun to do that all the way up to doing some fancy deep analysis about flight efficiency and safety over and continent so thanks for your attention if you have any questions i'm glad you answered those and you feel free to drop me an email anytime about the things that we are doing here so thanks bye