 Bueno, pues, Osweia, presenta, eu vou... Bueno, ele vai falar para vocês sobre o David Arcos e ele vai falar sobre como usar Python referendo ao programa Airplanes, sí. E, eu acho que... Ostras, tiquita o todo. E, no, no, no. E, se vocês teniam algo? Hi, hello, welcome. Hi, this is Python in the Sky, a presentation about a use case of how we built a wireless flight entertainment system using Python. The talk will be divided in three sections. First, the product requirements, then the architecture decisions we took, and, at the end, a typical challenges. Strange stuff you don't see unless you are working with airplanes, ok? This is who I am, well, like ten years ago. I'm David Arcos from Barcelona. I have been a Django and Python developers since 2008. I am co-organizer of Python Barcelona. We do a monthly meet up. So, if you come to Barcelona, you are welcome. And I am lead engineer at InFlight. What's InFlight? InFlight is a startup from Barcelona. We do wireless flight entertainment and we sell it to airlines, ok? So, let's start with the product. What's the wireless flight entertainment? How does it work? The idea is that the passengers inside a plane, with their own device, they connect to our Wi-Fi and open the app, ok? And then they have a lot of services inside. So, they can use any kind of device, laptops, tablets, smartphones, whatever. They connect to our Wi-Fi, which is inside the plane. It's internal, an internet, no internet, ok? And they open the app. The most used app is the web app. It's just a normal website. But we also have mobile apps, ok? So, what kind of services do we provide inside the plane? It's more or less the same kind of services we used to have years ago, in a physical way, but now nowadays it's digital. So, first of all, flight information, ok? You can see a very nice map. Where do you have the plane, your destination, whether at destination, flight, flight altitude, whatever, speed, external temperature, dam windows. It's not mine. Ok, so you can see videos, TV shows, kids videos, music videos, all kind of videos. You can read stuff, newspapers, magazines. You can get stuff about the destination, city guides, offers, deals, this is customized for each city you are flying to. Right? And more stuff. The problem is that the airplane, when flying, it's offline, ok? And some services require some kind of connectivity. In example, if you want to book a service, if you want today's newspaper, it has to be the newspaper from today, so you have to synchronize. Or if you want to pay to download coupons and then redeem it, ok? So you need eventual connectivity. This means that when the airplane lands, then we connect to the internet, to our platform and we do stuff, ok? So the requirement was to update the contents. Nobody wants yesterday's newspapers. Send bookings and reservation, actions to external APIs from other providers. Do payments, very important. If there is no payments, there is no money. And send emails. Welcom e-mails, spam e-mail, everything. Also another big requirement was the ground mode. So you have the service inside the plane, but once you land, you should keep being able to use it, right? So it's the same experience before you fly inside the plane and then later you can bring the contents, some contents with you, you can bring magazines and newspapers, but not videos because of, well, licensing. You get e-mail confirmations, ok? The idea is that you get into your destination and from the hotel you can access the map, see the nearby restaurants, whatever. Bring your coupons that you downloaded. Ok, so now the interesting part. The architecture, ok? How we built this system? I will go layer by layer, top down from the most external one to the most internal. Ok, this is a simplification. It's a bit more complex, but it gives you a good idea of the different layers. So what happens in the airplane, the front end and back end, the system services we are using and the hardware. And then after the airplane, what happens outside, ok? We have a data center with machines. We call it the younger, very imaginative. And we do processes and stuff there. So let's start from the most external part, the front end applications. Users have a device, they open the website. That's a web app. It's done using AngularJS and SAS for the CSS and Grunt for the rating. Angular, if you don't know it, it's a JavaScript framework which is used to call an API. It gets the data and that's it. You don't need a backend to implement Angular, ok? It's done just in JavaScript. The mobile apps Android and iOS. No surprise here. And all of these apps, these three apps, use the same API from the backend, ok? So in the backend, I think you already know these technologies. Python, Django and Django REST framework. They are very useful to implement Django REST framework. It's awesome. And we did the API very fast and it's easy to maintain. We use Django for a lot of things. It has a lot of features and it was very useful. And also some Python scripts for some different things. The system services, this is for the web server. We use this as a very typical configuration in Ginks as a web server and also for the static contents, images, videos, pts, whatever. Unicorn as a web server, which is the standard server for Python. And Supervisor is a nice demon that monitors and controls many things. So the idea is that your Django code is run in Unicorn, but Supervisor is calling it. We use it for the API from demons. We have some agents that do stuff like synchronization fixing inconsistencias and also for salary for asynchronous tasks. Databases, Postgres and Redis, no surprises here. Postgres is very useful for SQL data because with Django if you are using the ORM and you want to use it because of the Django admin, very easy, fast and reliable. And also for critical transactions. As we are doing payments, we have to be very sure that we don't lose anything. Redis for the non-SQL data, we have many use cases here for the sessions, an example for everything in Caché, any kind of data that expires, and also for metrics. You can do a lot of stuff inside Redis, it's time you do an API call. Networking, this is the SSH icon. All the aircrafts are connected to the central point, the Anger when they land. So they use SSH, everything is secureized. We do some tricks to make it still safer. So we go through our VPN, it's not available in the internet. We hide the ports, we only use private key authentication. And on top of SSH we have two services, Ansible and Fabric. Both are Python, very nice. Fabric is a bit more simpler, it's a way to call commands on top of SSH, so you have some libraries in Fabric and you can do all kind of stuff. And for deploying things, copying configurations, initializing machines, very useful when you have a big fleet of several machines. We are doing the transition from Fabric to Ansible, now we have both. Ansible is very useful for updating the ploids. You want to deploy new code. If you want to do config management, an example I want to update the radius configuration of all machines. This will take care. And we are using the pull mode. This is important. Usually in Ansible you execute it as push, so you say now execute in these ten machines. But as we are using airplanes that are offline and they only get connectivity. When they land, we cannot do that because we would need to wait until the plane arrives and that's a lot of time and it's not viable. So the pull mode means that when the airplane is online, it just gets the last version of code configuration from the repository and installs it with every version of every software we need. However, this is interesting. This is an airplane. In the airplane you can see here the three access points, normal routers and also the computer we are using. It's an embedded computer. So keep in mind this idea. The different devices connect to one of these three APs. The AP is connected to the server. While it's access points, we use three. They provide the Wi-Fi, very simple here. And they isolate each user so a user cannot see the traffic from the other users. It doesn't matter. We are using SSL, but just in case so they can't map the network or whatever. And this is a WAP in the lab. Then this connects to the aircraft server. This is an embedded computer. Racked military grade, hardened, very, very reliable. This is certified so it won't catch fire. It won't, I don't know, do an electrical problem. It's very, very reliable, very, very expensive. Certified means expensive. We use Xen with virtual machines. We have Ubuntu because it came with Ubuntu so we have support. And it has, in the picture you can't see it, but it has a 3G data card just like your cell phone but with a lot of gigabytes, which are used when the airplane lands to connect to the internet and get more stuff and push data. Also very important day, avionics boost, that we will see a bit later. So this is the server that we have to install in each plane, connected to the three WAPs and here runs everything. The avionics bus, this is a connection to the airplane and we get information from here. This is only read, not write. So we can get information but we cannot write. No, I'm not kidding. This is important because there has been many news media that say that hackers can go into airplanes and get control of the plane and do stuff like shut down the engines or whatever okay, not possible. Even if they hack all our software, I mean it's read only, you cannot write there. So this is very important, even if the wireless, the SSL, our software, the virtual machines, even if they get zero days for everything, you can write here. So this is, if you are used to developing mobile, you have some information, we are connected to the airplane and we have a lot of information on everything that happens there, okay? The altitude, the IDs, pitch, yaw roll, if you play flight simulator, this is the same. We are calling it each minute and it's very reliable. We use this to, when we show the flight info, it's a map like a Google map but using open street maps looks the same. So we paint the airplane there with the right direction where you are going, where you are coming from and we are using this data and people love it. Also describe time signals, I don't know what this means but we are using this for some events. In example, if airplane is touching the floor, they have weight on them and if they open the main door or they close it, we get a signal and then our computer automatically it cuts the internet or stops the internet. To be certified, you have to implement these things, okay? So from these signals, we can know if the airplane is flying or if it's grounded and then we can use the connection. Okay? But don't ask me because I would have no idea. Okay, the hangar. It's outside the plane, this is in the internet in a cloud platform. The airplanes are also a cloud platform in the actual clouds but it's a central point for the platform to synchronize everything, it orchestrates the operations, each time a plane lands, sends the updates, how many users registered, how many payments, whatever actions they did. It gets new contents, updated magazines, videos, whatever they need. New versions of the databases, new versions of the users, everything, even the code. Yeah, some operations we do, of course manual operations we do from the airplane outside. We have a lot of processes that ingest stuff from other sources. We have processes that send data to providers to do bookings or whatever and we have an administration tool so our people can add offers and edit stuff. So airplanes, when they land and only then they send their information and get new data, okay? An example, ingesting new resources. We are uploading thousands of resources each month from a bit more than 20 providers. Different languages. Right now it's in Spanish, German and English and we are doing French, Italian, soon. Per country, because different countries have different magazines and also per category. Videos, readings, deals. The fun thing here, the funny thing is that some external APIs from external providers, they suck. Okay, then they don't give you the documentation. They are not using HTTPS. They are using an FTP that maybe the intern uploads manually each night. They give you excels. That's the worst part in all the project but it's not done inside the airplane so it's easier to manage. If you get errors here, it's easier. It's not critical. If you cannot ingest a magazine, I mean whatever. The videos, we use to ingest them from S50P or S3 in Amazon and we use a tool called Amazon Elastic Transcoder which is a platform as a service where you send videos in any format and you choose the output and they produce the right version for you. We are using HLS which is very useful for mobile devices, okay? And it works also on web, Android, iOS. And the synchronization is done time by time. This means that the video has a lot of pieces and when synchronizing we do it piece by piece because if we wanted to upload a video of 200 megabytes at once I mean it's not possible because the airplane stops for maybe 10, 15 minutes and you don't get much connectivity so the video synchronize chunk by chunk then it loses connectivity then later starts again, okay? For readings, magazines and newspapers we ingest from the same places, different directories and we are using Celerity Tasks. Celerity is a process to execute a synchronous task. Okay, so you say ingest this PDF from the server and then you do different processes on top. The most important are to reduce size because they love to give you PDFs of 100 megabytes. It's amazing for our poor network. Also to generate thumbnails because everybody wants to see the thumbnail inside the application. I should have more screenshots. And last the ground mode. The ground mode means that when the... well as I told you before when the aircraft lands the passenger gets outside and can do similar things, okay? They open an application which looks the same. Web mobile and iOS, web Android and iOS they all look the same. We take care in the details and in ground it looks mostly the same. So minor differences, no videos. We cannot stream this kind of content in the internet. We are focused on destinations offers because if you are going to a city you get info from that city from restaurants and places surrounding you. Some functionalities that only work online like they forgot my password. You have all the magazines, you have the tickets you bought so it's very easy to... I don't know, you are in the plane, you get a restaurant offer and then later from the app you have it there and you can just use it. And the counterpoint here is that in the aircraft I mean we are using A320s which have like 180 passengers, okay? That's for a web platform that's very easy to scale. But in round we can have thousands of users at the same time, we are not there yet it's made to scale. And the last part, the funny one are typical challenges. When you are working with aircrafts you have some specific perks, good things, bad things but it's different that we were using it too, okay? There are some extra challenges and problems that we weren't expecting. We had no idea and also we did some mistakes then we fixed them. So the first one was the regulations and certifications. Why are we doing this product now and not a few years ago because at the end of 2013 the ASA did a regulation that allowed all personal electronic devices to be turned on inside the plane, okay? Even in landing and in take off. Previously you couldn't. You could only turn them on while you were flying. So this law that was implemented in all Europe the following year allowed us to do this product. Since this law at the end of 2013 it took like one year and at the end of 2014 we were already having some two or three airplanes with our product flying, okay? The day we got the notification we did the first flight and it was awesome. So you have to certify every step you do even if the hardware is already certified even if the access points are already certified you have to do a lot of stress tests wireless tests, everything, okay? The aviation industry is very conservative and you have to be very, very safe and slow. Even the smallest screw not kidding, this is each hole we do in the airplane to put cables or whatever everything has to be certified, procedures and it's very, very slow. Took us six to nine months just to get the certification which was good because then we had time to do the implementation, okay? A typical project you sell it to the customer and then you have to implement it so we had nine months to implement everything and it was awesome. In the first flight everything worked perfectly. But the certifications are boring. Intermittent connectivity is a technical issue. We saw the airplanes would be connected all the time it would be a typical project we would have a lot of time at night a few hours to do upgrades and everything normally, no way. Each day we get like an hour of connectivity, okay? It's airplane when it lands it stays for maybe half an hour so we have 10, 15 minutes of connectivity because they shut down the computers many times so we don't have a lot of time to do synchronizations and updates also there is the roaming that means that if you send data it costs you a lot so we have to do a limited mode when we are in countries where we don't have roaming so what we have to do here was to improve a lot our deployment tools and to optimize the performance of the synchronizations doing things like ercing or like the chunks in video going step by step progressing a lot too so this was a big issue because you are not used to a distributed system which works offline and you don't get access to updated stuff okay hard shut down, this is maybe the worst well no, the worst will be shown later hard shut down means no a shutdown procedure so they just click the button or it gets disconnected and the hardware gets very beaten so the electrical power is suddenly removed now our server is turned off and this is also for the for corrupting file systems and everything happens very often many times a day when they change the power in example the airplane lands and they shut down the engines of non-electricity and they connect an external power source in that microsecond, bam, reset the hardware is supposed to manage that but well, marketing and also the pilot has a very nice button on top so they can disconnect our series okay, if there is a storm or bad meteorology they can shut it down the regulation says they can shut it down there are a lot of non-critical services but most of it is when changing the power source here we can do nothing just mitigate it if the service gets corrupted where we are using virtual machines so the host machine is never changed we are only changing an internal virtual machine so if it gets corrupted we can do we have to fix the inconsistencies this was our first mistake we trusted the hardware but the file system got corrupted a lot some files were lost some other had size 0 so the contents were deleted it was very fun euron fsck at start so it checks for the it's a file system checker it checks for corrupted things in the hard drive and as we are using virtual machines it's doable and also when deploying each time we deploy a big file or small files, whatever we deploy we do async async is a file system command that forces the operating system to flash the catches of the hard drives of the memory everything to this so you do async and then you are 100% guaranteed that it will write to the disk this is good because this has saved us a lot of problems anyways it can be shut down while you are doing the sync and you lose the data anyways so you have to compensate for that also as it was very typical we have some consistency checks we check all the time for the contents if they have size equal to 0 or if they don't exist anymore because sometimes you download a video with some chunks and then some chunks disappear so once you update and say this content is now available you still have to check it again another this was very minor the internal clock got corrupted the locks were funny because it showed wrong times at the same time it was in two flights so we added some NTP checks NTP is the service for updating the clock it goes to the internet checks to the master clocks and updates the internal clock so we are only doing it when it lands and this is the funny stuff the captheorem how many of you know the captheorem please raise your hands ok good captheorem is a theorem in distributed systems that said that you cannot have all three you can only have two the three are consistency so all the nodes in your platform see the same stuff in our case all the airplanes have the same contents availability that every node is able to give a response in our case all airplanes are working and partition tolerance if there is a network partition the platform is still working network partition means that maybe you have three data centers you get one data center out and you still have service with some machines or with all so CAP our big mistake was to ignore it CAP you can have C&A A&P or C&P ok if you try to beat it and have the three of it you will have a bad time we tried to because of course consistency was a requirement we wanted all the planes to have all the time the last version all the users all the contents the last database availability because a plane being offline has to work partition tolerance by default we already have this because it's plane is partitioned from the network it's offline so from default we have the partition tolerance and then we wanted the availability so at the end we took out the consistency ok so right now our system allows for availability and partition tolerance and works much better it works so this means we settle for eventual consistency that means in our case maybe airplane has an old version of the software of the database of the contents doesn't matter maybe today it didn't have time to download the last new paper doesn't matter it still works ok it will eventually fix itself because it will keep synchronizing it's time it lands so if you design distributed system keeps in mind it's important ok recapitulación we did a complex project with lots of features I have only shown a few but it's really complex we have 20 something jango applications in our project ok so this can give you an overview of this game we kept a modular design doing different apps and keeping the stuff in different places we could implement everything and we would help a lot by the existing libraries and there were unexpected challenges we learned how to fix them it was difficult at the first but now we have learned a lot the conclusions are that python made it possible it's very versatile so we can use it at 10,000 meters covers all our use cases we have done this because we stand on shelters of giants which means all the software stack we already have allows us to do the as part and to integrate different things together and it was developed in a very short time because the implementation happened for like 6 months since then we have been adding new features of course we need more bugs to fix so we need to add features and we will keep adding them but it's very agile it's very fast to implement and that's it thanks for attending I will pause the slides a few minutes later I will tweet them and in inflight we are hiring a visit that page and see if something there fits you now you have questions more questions this works now hi I have a question regarding the aviation's bus how is it connected to your system is it IP based or anything special it's a bus called Arring I don't know the specifics it's a bus called Arring our computer gets a cable and it's connected and that's it we do polling each minute it updates each minute and once you connect you keep getting data and you cannot ask for data you just listen everything it sends you I don't know the specifics but if you look for Arring it's a standard and that's it ok, thank you and the second one where is it deployed so which flight I have to book to see this again, sorry where is the system deployed so in which flight or which company I can see this right now we are deployed in Iberia Express ok, which is from the IIG group we did the first phase since last December of six months if it was a pilot phase with four airplanes it was a success so now we are deploying in more airplanes on that airline right now we are in eight airplanes and we will start with other airlines from the same group ok, thank you again welcome it's very expensive to deploy this kind of stuff because the hardware is expensive you have to do a lot of operations you have to deal with certifications so we have to go one step at a time you said that there are problems with power outages so have you even considered using like a battery or something or is that no way because of the certifications certification again we look to put capacitors or something but certifying that would take like half a year and we didn't have time eventually we will do something like that but the hardware we had it said in the marketing prospect that it already had some internal batteries it's not enough I see and the other question you said that you're using Ansible in the pool mode so it's pulling down the changes and applying them to the system I guess yes so what if something goes wrong with the actual system and then therefore it can't pull the data or how do you fix that then we have it hasn't happened ok but then we would need to connect to the machine via SSH, via the hangar and then do stuff there we have had all the kind of problems we have connected to this installing a bad deploy that then does a migration of the database and bam, no more service but with Ansible I mean it's very small the thing we are doing it's a very small repository it's updating all the time so should have problems ok you said that you decided to scrub off consistency so what are you planning to do when the user count spikes a login system inside the plane so each plane has the complete database of users yes so what will happen when you have 1 billion users you are asking for the worst case well what do you have in mind or what's right now the worst case and we have this case is that the same user takes a flight, it lands we don't have time to do a synchronization ok and then maybe he takes another plane and he doesn't have the user in the second plane ok that happens sometimes and then when we get from airplane 1 the user from airplane 2 same user we have to do like a merge of the inconsistencies this is done manually right now it's not very often mostly with pilots they use the app but maybe I shouldn't say that not well flying later but the problem exists we settle for it I mean we cannot do anything else we can do a better merge procedure but right now it's good as it is and the second question I'm curious are you allowed to record any kind of events from the plane and then upload them to your servers events like user events yes no no user events but plane events like the database of the we were thinking on that because we have all the data from the airing bus and we could export a safe game for flight simulator something like that we could do if you know flight radar 24 which is a nice application that shows do the airplanes we could do the same because we have all the GPS positions of our planes and then at the end we could do some big data magic and know some stuff about how many each flight takes about I don't know this kind of stuff but right now we are a small team we are a start up we don't have much time for these funny things some day we will do it we are saving all the data and some day we will do these things but the main question is not if you are actually doing it but if you are allowed to do it yeah sure yeah so the airline needs the travel is with user data emails but with the flight information I mean it's public it's public if you go to flight radar 24 you can see where it's plane is well but not only that I mean the doors ah the doors no no of course no but I mean what can I do with the doors I want to see on the maps the pretty stuff no but I mean you could I don't know well that would be a bit stupid calculating I don't know that user retention when there is turbulence I will ask maybe they like they enjoy the movie more I will ask thanks do we have time for more questions 4 minutes 4 minutes faster you said you had to respect a lot of regulations to be able to launch this but how did you do the proof of concept tests for the system like how did you put the first server in the plane and make sure that it actually works because we have the picture of the embedded computer we have that at our office at the lab and with that we can do simulations in the wireless access point down there we had a simulator that you just push buttons and it's like an airplane so we have one that is the door the other is the ground ok is this the same way that you test when you do updates for the synchronization part using the simulator more or less it's the same but obviously it failed big time because in the office you have a good connectivity you don't test for some things and then in the planes we thought that we would have like 4 hours 5 hours of connectivity each day and now we have like 1 hour we are not even spending the 3G data on our plan ok so that's a problem but yeah, testing is not the same as production so how come you get so little connectivity because I've observed that on a plane lands it connects to the gate a couple hundred people get out they clean the plane couple hundred people get back on that takes more than an hour why are you only getting 90 minutes of connectivity if you're at the gate for an hour like 3 times a day they do a lot of operations they are maybe putting fuel and they have to turn down the electricity I don't know the specifics but I know and I have the data for that that maybe we have 5, 20 minutes or nothing at all or maybe some day we have an hour but you cannot predict that ok ok so there is no way to predict that there was a question at the end hey I want to ask so I know on some transatlantic flights for example they have wifi on board while in flights can that be used for synchronization for example so you can get more online time for the app and like do the updates I don't know maybe maybe ok thanks well that's it thanks for coming yes thank you for coming no hardless