 Why not? This is surprisingly crowded. We were originally told that there might be like one, two, three people, of course a little bit more. Okay. Hey, this is Sean Parker and I'm Paul Schridder and we're here and tell you something about testing software on emulated hardware in containers in the cloud and I hope you all like that. We tell you something about the project, why we're doing this. It's high-level architecture, it's high-level architecture. The test permit where our stuff is located in the test permit, about the target devices and the challenges related to them, a little demo and in the end Q&A and recap if there's still time left for. So we're working for an international train company and it's not about cargo, it's about travel information like the thing below and you might all have experienced the delays and so like five minutes delay on the display and five minutes later it's 10 minutes delay and 10 minutes later it's 15 minutes delay and the reason for that is the federated decentralized system of servers serving all these displays and if a train did not arrive in the visible area of this server the operator doesn't know when it will arrive and so it's just starting to type five minutes and five minutes later if the train's still not arrived he's increasing that so the goal of our project is to centralize that. We have a partner project in another city which is doing this consolidation of information if you have conflicting sources and conflicting information which source we trust more and this is a tough problem on its own but it's thankfully not ours and we're more focused on devices itself, the management of them and the rendering for the different display types and this is where I hand over to Sean. So I mean Paul already mentioned that like the old system was a federated system where depending on where the information display is or perhaps where the train is the information might not propagate properly like to the end station. If you have a train starting I mean in I don't know it's like eight hours early like in a totally different part of the country the information displays are currently with our client perhaps not timely updated because yeah the flow of information is just not giving that information so this now is basically a simplified abstract version of what we're doing so basically Paul just mentioned the system that is basically at our other project consolidating the data so and the planning basically delays and updating the information continuously so in the background we have something like a well Revit and Hube broker where you can basically subscribe to a certain train track and so forth and then you can basically get the information and you continuously update it sounds easy is actually quite complex and I mean it's improving from year to year but what we are then doing I mean the other project is basically the external domain up there so there's single source of truth block of travel information that's basically where all the information goes in so train delays when the train start when the train arrived scheduling and all that kind of stuff and it's also basically interfacing in the background with information systems from other countries or other operators so that's where all the information goes in and our project is basically interfacing with that and you might see a train tracks different devices basically voice devices so basically if a voice announcement comes that's from one of those devices if you have this displace LCD of TFT the old ones and the new ones that's one of those devices and they might might still have a variety of sensors so you could think of our system a little bit like Netflix for travel information so we're basically we have a big Microsoft architecture in the background and depending on demand I mean you can imagine during the night there is not that much going on I mean there are cargo trains and night trains but really rather during the Russia we have like most of them for information going through our system so it's growing and it's really pretty much like a sine wave over the day and falling down in the night yeah indeed so and what we are focusing on I mean this back end block without getting too much in the nitty-gritty parts is already complex in us so that's the Netflix bit and then still we have quite a lot of different devices we have to interface with so those devices are quite often well paid by public money so you can just can't just replace them so you have to basically retrofit them with yeah in your operating system that's what we are doing because I mean there was no clean standard basically in the past so for instance we had a different a lot of different device types with different interfaces and that of course doesn't make it that easy to integrate with them so we're building a new operating system and hopefully it will work quite well in the future so and that's what this talk is about in its core so this little OS box we're building the operating system based on build route who knows build route quite a few good cool so it's basically a toolbox really an abstract way it's something you can use in order to build a Linux operating system and to customize it to your needs yeah are we doing that but that's of course quite difficult because testing hardware or testing software for hardware with different configurations it gets complex quite quickly due to commentatorics cool yeah so who of you has seen a test permit so far so probably you have experience in the enterprise domain or somewhere else or you have made some nice PowerPoint slides about that to convince your manager that it might be reasonable to write a few tests yes that's a good idea so integration tests and unit tests that usually happens with software and but at some point you you have to integrate it with the real world and that's usually where things blow up and and you actually want to find bugs and problems as early as possible because I mean in the beginning if you have a unit test right like you find a bug it's just like you fix something and recompile and it's done and it's good and you're happy but if the client finds the bug it's like you're getting called operate ops is unhappy you have to roll out everything you might have to wait quite a long time in order to get your fix out and it's just not making me as a developer happy because basically my development flow was broken but yeah long story short you just want to find bugs as early as possible and that's what we're doing with with those operating systems which we put in containers in order to run them in the cloud because then we can do that at scale more automated in the CI instead of flashing it or automatically flashing it to the limited set of devices and testing it manually because there we are basically limited to what we have hardware wise and you have to be there physically the stuff we want to present you fits in this column of the in this row of the of the test pyramid and the target devices are pretty it's they are there's a huge number of them I guess 6000 at least a lot of different types most of them are the old ones like the LCDs over there are PC 100 force the form factor but they vary in CPU in RAM size in CPU types in the interface as you have these LCDs and the TFTs which are like better TVs but they are powerful quad cores with gigabytes of RAM and the outputs are different we have different boot mechanisms like some devices only support we fee but the older ones only support NBR and our goal is to have like one image to that runs on every single device this is not possible we have a variety but still we want to have it reduced and not extended to have the main to have it still maintainable and testable there is different this is just the devices itself there's different external peripherals like it's called too cold sense one train stop sensors what's called like this the displays are sometimes controlled via serial protocols you are it's or RS 485 some some sensors are connected via I swear see there's some sometimes there's special translational hardware which presents itself to the system as an as a graphic card graphics card and the whole thing gets translated into pixels down to the LCDs so propriety proprietary old legacy stuff no source and there's sometimes a lot of reverse engineering involved and because some of the the suppliers of these devices are not basically helping us I mean it's like a close off software so why should we help you because then in the future you might do everything yourself maybe sometimes they even don't have to source themselves anymore so sometimes they go down in the basement and have a look in there in some don't know file folders and like real paper and stuff anyway it's quite interesting sometimes you basically log on to those old machines and you have a look at the history and people jumping from one machine to another because that's something we really want to fix because that's something you can well you probably know yourself that's something that is not maintainable so we really want to push a standard standardised interfaces so it will be maintainable and we'll make us as I'm a DevOps guy and use more developer guy but that will make me personally really happy that work yeah the standard workflow of the suppliers was have one DDN image put their software stack on it put it in the device fiddle around till it works and then never touch it again and what we we're trying to do is like automated updates in the field and on 6000 devices with thin thin network lines with sometimes only take 10 kb bandwidth you really don't want to push that often an image through it especially if it's like 200 300 megabytes so we would like to test the stuff before we roll it out and make sure that it's working like we wanted to so but the problem now arises that we have like a huge hardware variety and how can we simulate that hardware in combination with the software you want we want to run it on and we're trying to use QEMO the QEMO is pretty awesome because it can you can really find detail specify how the hardware should behave you can select different CPUs you can connect different serial devices in different options like is it connected by an isobus or a USB and you can change the details of the DMI table which is like a specific part of memory on the hardware motherboard itself we heavily rely on that to identify the device the different devices so in order that we know what services we have to start and which not so on a tft you need we use there an electron to display stuff and on an LCD we use like custom written controller service which is communicating with the LCD displays so yeah we choose QEMO and it's open source so this is the container how it looks like it's not what we're using on the project but it basically reflects the state of the composition you have the outer container it's an alpine whereas QEMO installed you have an inner QEMO running which gets the disk image via and volume mount and some ports are exposed for example the SSH to remotely control it in the future we would like to get rid of that and do everything by a MQP commands but this is still far-future does everybody know what MQP is basically the protocol used by Reviton Q which is our middleware up and so the interesting stuff for QEMO is the DevTTYS0 which gets exposed on the socket 9000 and can interact with external hardware in this case with a mocked hardware and the control port of the mocked hardware is also exposed we have a Novi NC which which lets us see how the boot process is but really the yellow stuff is only optional it's not really necessary for testing the image itself so it's quite nice so you know no fancy it's it's basically html vnc applet basically so you can browse to port 6080 and basically see what is going on on that machine and you don't need a yeah one of those application installs so it's it's tightly coupled in there so what we have been doing at our project we first did that actually with graphical applications with a no-founcy in a container so we were running up to 600 of those devices and you could just click a link with a unique identifier of the simulated device they could see what was going on on one of those devices so that's really helpful if you want to stress test your system with real devices especially the back end not the devices itself to generate load on the back end okay what you're going to see is something like this setup it's not much different but the mocked you see like the mocking part is is deleted and the the the TTYS0 port is just like exposed by the Docker container so we can interact directly so we kind of doing the manual mocking yeah so what we we're going to show you basically how that thing works we we're going to do that unfortunately locally because we can't interface with our clients CICD this system at this point because we didn't get approval but that's fine I mean we didn't get an answer so far so so that's why we're showing you it basically and then we're telling you how we are you doing that currently in the CICD context of our project and how we're deploying that okay demo time this is me I guess if I I think if it's mirror it would be better no speed yeah give me a second perfect so there's the Dockerfile which builds like the container you've seen so far and there's this QEMO disk image I was trying something that's not so not so interesting we have a makefile which should make it easy to do stuff right now but we can make run cocker so we still attached to the container which boots up right now so we have a little delay between the QEMO start and the no VNC start because if you start them right after each other sometimes no VNC says it can't connect to the VNC port of QEMO so details not so important the next interesting thing is we can have a look at the booting container which looks like this so it's this is what you would see if you would start QEMO manually and you could interact but I guess here it's disabled we can now connect to the container and to the running session inside the the service running inside runs in a T-max so I can attach to the output again but on boot it starts as root so we have to Zulu T-max we can LS first we see a session running we can attach to this to the session with this name so there we see that the HTTP server started and the sensor handler started on the DevTTY as zero with the given board rate so now we can now we have a look at the at the service we want to test which is on local port 8000 which looks like this so far it hasn't received anything make this a little smaller that you see okay and then we can send some data we're not waiting for a response connect to local host and the exposed port and let's try this so you see the sensor received there are some raw bytes decoded them stripped them parsed them to int and if we so if we plug something in which is like non-passable as an int we see that an error happens and test scenario would we put garbage in and see if it's got parsed by the by the program itself so we see it didn't change yeah so what we for instance doing with yeah sense of data perhaps as we all our devices basically have a exporter endpoint basically so a Prometheus is basically scraping metrics from that system and well it then basically when Prometheus digests all those metrics and we can do later on something without quite interesting to see in the morning where the Sun is going up or how bright it is or whatever but it's also for diagnostics quite interesting so I just go back a little so we push some data in here it got processed by the by the custom the custom service and get exposed on the 888 port again so this is like a round trip through the through the software to test yeah so since it's Docker rise you can put it on Kubernetes and put it in the CI and run tests on it do it automatically yeah yeah so what we're doing we have for our projects still help charge which is the configuration layer you basically tie around one of those Docker containers and then you can basically mount the the news build or OS image you want to put in there then basically run it and you're also through Kubernetes can tag it put all the metadata into it and run it at scale that's actually quite yeah helpful for us at least because now we can basically show the client what we're doing and what kind of different configurations is retestable which is usually not the case with hardware if I mean probably a few of you have tried replicating things without where it's just just takes a lot of time as you see it's not silver bullet because like this special hardware is really hard to mock a special graphic cards you can't really mock and clear new and this is not feasible but the standard cases are doable and it's not possible to have a sample of every hardware configuration in the lab first we would need like a warehouse to store them all on the other hand you would have to plug in like the images or like deploy them via updates but still so and you're not really controlling the input on the test you're just testing that it's doing something but you don't know if it's doing the right thing for example if you can't control the temperatures temperature sensor right and so this approach makes it kind of possible to get rid of all these problems and not all of them for sure but get rid of some of these problems and make the workflow a little more little more streamlined okay cool that was it basically so we're open for questions I guess we have plenty time for questions oh go for it I have to repeat the question so it's no yeah the question was if these old legacy systems were hacked if that's correct yeah yes hey question okay we actually have a lot of those devices shipped to our office so we have a quite big laboratory where we can test quite a certain range of models but office is quite stuffed with it so I mean that's our bottleneck we can test a few of those devices live at the stations but from a security standpoint of you we try to limit the access to the network for yeah for once because of past happenings and yeah security is also a reason why we're doing all of this so we can basically close down the operating systems and yeah the workflow how the the suppliers deployed a new image was like a guy was going there opening the device putting a CF card in there with the new image and fiddling around and so if that was a question yes the mocks are getting developed by reverse engineering sometimes there's a spec but usually it's wide as a door and every vendor made its own solution to that you first there was a if we have considered Linux kids I guess so the decisions were made before I came but we have good architects and I guess they did so but I can't tell you why the why the pigs built route I think because you can choose pretty specific what's driver what's core utility you want to have in there and if you really need it and I guess that was the reason for it so back in the time build will basically seem like a good option so when we started or when the product was started was not quite sure what var variety of hardware is out there so with I mean of course you test against what you know or you basically build the architecture against what you might expect and at that time we had a certain amount of knowledge and build was basically fitting for that I mean it's a long running project so we haven't started like yesterday yeah if you are interested just drop us a note and yeah we have the code basically in a public repository so we probably gonna push is that possible cool yeah so you can have a look at it okay it's complete it's like and our project is completely built in the CI so we have a good level CI system and as soon as you push basically it's a through the hook basically then builds all the code and the ploys is automatically if that was here so it's again so the moment it actually just deploys it so the lifecycle of the deployment is there independent from the yeah let's say runtime of the of the CI deployment basically and we haven't integrated that this one so far this requirement was we just want to see what is running on there and if it's crashing and that's something you can see over time quite well in Kubernetes through logs metrics and so forth another thing like compiling build route is like a huge deal it takes really long even in the CI even with heavy machines so it's not like we reduce it down to one and a half hours I guess like with the whole tool chain and but still it's you want to have it optimized you can't wait click a button and wait for it to compile if we build separate images for the target hardware in the QEMU no we don't want to do that so we don't want to we want to test the real image which goes into the fields not a specific image for QEMU we are having some kind of a variety for some in-house monitoring but this is not going ever in the field down there if we're using caching for the build route part stages yeah yes we're caching the the the tool chain so usually build route builds the tool chain as well especially if it has a blank system and but what things would really speed up is the CC cache but we didn't get it working till now but this would be a really boost like I guess we could make it in half an hour if we do so someone else I guess then we're spot on thank you very much for your time