 Welcome everyone, it's another morning in Debconf and for the first talk today I'll be talking to you about what we have been doing in in BMW for the testing of embedded devices and what parts of that I will try to bring into Debian so that other people that are developing hardware can benefit from whatever we are developing in-house and can try to use the tooling that we have and hopefully contribute to it as well. First about me, just a bit, so I'm Igar Smachinos and I've been a Debian developer for a long, long, long time and I've been a Python developer for almost as long and so for the last 12 years I've only developed in Python. In my previous work history I worked some time for Nokia doing testing for them and then was doing some testing for the UVU setup box for the BBC so that was another part of embedded hardware that I was helping to test to bring to market and after Debconf 15 I joined BMW CarIT which is like a sister company of the BMW Group where all the software talent is concentrated and grouped together so that we can bring the best of IT area knowledge to BMW processes and since then I've been busy testing the next generation head unit for the BMW. So let me recap a bit more a bit from my last year's talk where I was talking about what kind of computers are in the cars. I can be slightly more specific this time because finally the software that we have been developing is actually in cars that are going into release now and the interfaces have been shown in public so when you get into a car right now a modern car you will be basically surrounded by screens. You can see some of them here you can see a screen in the central console with the with the map you can see a screen just behind the steering wheel with all kinds of indicators and maybe even a map if it's the car is new enough you can see a screen on a heads up display with another set of indicators you can see a screen for the climate control in your cars you can see even a screen on your on your lock on your remote and behind each of those screens there's a computer which is running software and most of those screens nowadays are strong enough computers behind them that they can just run almost regular kind of linux with applications running on top of them but that is just the tip of the iceberg because there are far more computers inside every car and there will be plenty more a modern car can have up to 80 different computers in it most of them are really small specialized units like the ones that are controlling the abs or the ones that are responsible for communicating with the with the mobile network so like a small modem with a built-in firewall and all kinds of security protections that then gives communications interfaces to the rest of the car there is a movement in the automotive industry to consolidate these 80 computers into something into a smaller number but as soon as you consolidate you bring up the complexity and the more complexity you have the more you need testing to bring to make sure that your complexity doesn't break functionality another big impact for the especially the car market is the longevity of our releases for example if we would start development of new software for a car right now if we freeze the version of every package that is going into a car right now optimistically that software will be actually released to the first cars in approximately two years after that it will take another probably four to five years for that new software to be extended to the whole range of BMW cars after that it will take another couple of years for that software to be released to all range of mini cars which are using basically the same software and another couple of years for that software to be released to a whole range of Rolls Royce cars and after that has been done you actually need to support that software that has been released for up to 25 years so yeah as soon as we free software now we need to do support at least in the sense of security updates for between 25 and maybe even 35 years when you looking at these kind of timescales you can no longer rely on people developing software now even being around at the end of this support cycle some of them they will be they will they will go on to other things other software probably other companies maybe even entirely different career paths so to do the updates of 20 year old software successfully you need to have a lot of testing infrastructure in place a lot of automated testing infrastructure in place that will continue running and continue providing you information if the functionality in your cars is still working after you've updated some library with the new security patch this brings me to what we do for testing in the project called MGU so the media graphics unit that's the computer that is behind the central console the navigation screen where we have the navigation the multimedia and configuration of your car all that kind of stuff so in that in that software project we have 194 git repositories where people work on the on the on the software we have 60 64 SVN repositories we use those to do like binary releases so some parts of the some parts of the software is so hard to build that we don't usually build it fully from source every time we actually build a binary release committed to SVN and then in the whole build process we check it out from the SVN again we have 62 meta repositories that's representative of like domains like navigation domain multimedia domain security domain those they are working in their own in their own silos and then the 62 meta repositories get integrated into one base repository as sub modules which then represents the whole software release and it contains around 11 000 packages that's that's quite a lot for such a relatively simple device as what we have on the on the central console in the car we have an image there's more than a gigabyte we have a 12 gigabytes of extra packages mostly that's debugging information and extra information that is required or extra code and tooling that is required for testing four hardware variants and 30 different partitions across these variants so each of those variations actually needs to be tested and we do three testing levels so uh so there's a there's a functional level inside the inside the specific domain so navigation domain for example they have like tens of thousands of unit tests and their domain level tests that that they do and once those pass that goes into a meta repository and does integration level testing and once the integration level testing passes we gather changes from all the meta repositories into one place and do a release level testing and when that passes we actually create a new release that development continues on from there from there on at each level the software needs to be built images for all the variants need to be created and build acceptance tests for every variant needs to be run and pass on both virtual and real hardware for the changes to be promoted up next so every day we do this this whole process for five up to five release candidates for up to two branches so for example there are cars that will be going into production this autumn so there's a branch that is targeting those cars and then there are cars that will be going into production next spring and they will have slightly different feature set so there's a separate branch tracking those changes so we have to do around 800 system builds every every week and 2400 component builds approximately every week there are there are more than 2400 tests executed on each each each time build is completed so every week more than a million three hundred three hundred thousand tests are executed there is a lot of hardware doing that so hundreds of machines thousands of cores terabytes of ram hundreds of terabytes of storage being used for for all of that and when all that is completed we can create a new release and that's that's new and shiny and then the real testing begins because when the release is completed we run run it through way longer testing scenarios that can take up to a week to complete so like a question being for example if you power on the car and leave it running for for two hours does it go into a low power mode when it's supposed to does the target reboot start from new if you if you leave a car for for several days it's supposed to power down completely so it doesn't drain the battery and then when you come back it's supposed to power up really fast so we do that kind of testing after the release is complete and a lot of other kind of tests because after after my group is done with the software it gets shipped to several dozen other groups around the world who then flash the software to real physical cars and go through testing scenarios by driving those cars so if there is something critically wrong with that software they would have wasted a lot of time and a lot of money if we if we pass that software on to them so to do that kind of stuff we've developed a bunch of testing tooling in house and what what I want to do is to get at least the commonly useful parts of that tooling out into the public and get it integrated into debian so that every developer who does any kind of embedded development tasks can either use that directly or look at that for for inspiration for their work this is how the end product looks like this so this is the the the screens that will be going out now in the new bmw x5 and the eight series and yeah so it's both both the central console and the so-called combi or the the instrument cluster that is just behind the steering wheel they're both using the same the same applications the same graphical toolkits to to to produce their results so let me step back back in history and talk to you about autosar and genivy so sometime ago automotive developers recognized that they've been all building very very similar boxes that do very very similar stuff but every one of them has been reinventing the wheel in terms of how these boxes are built how they talk to each other what exactly every box is doing so how how the work is divided between the boxes so they came together and creating the autosar consortium where they started to unify standards on how the devices in cars should be should be constructed what should be the overall framework the overall architecture how devices should be talking to each other and how the devices should be developed there is a lot of a lot of lot of paperwork a lot of all kinds of standards and some of them are even useful so i'm going to talk to you about one of them so dlt is a diagnostic login trace that's the interface defined in autosar to help development of embedded hardware it's a way of getting diagnostic logging and tracing information out of the hardware the actual implementation so the autosar only does like specification standards it doesn't do any any actual software the actual software for all of this is being developed by another industry group an association called genivy and in this case genivy developed a set of set of tools to working with the with the dlt it's focused on what's happening with the device during its development during its debugging when the software is released most of those features usually are turned off so the idea is that the dot is a demon and that's running on the device that is collecting information from a bunch of sources including system log including adapters that would feed all kinds of useful diagnostic information into the demon by default the demon is supposed to store the information in ram not actually write it anywhere on the disk so that it doesn't doesn't disturb the device being tested as much and then when you connect to the demon with a with a dlt client it will dump all the information that it has to this client so you can get the log from the very startup of the device as soon as you connect to it and from there on you get the incremental logging from what what what has been happening since then it by when you've connected it you are getting the basically streaming stream of the logs from the device in near real time there's there's a bunch of tooling to get the different priority levels so one of the key points of structure of the dlt log is that every provider of information declares their application ID and the context into which they're logging so there's two small textual IDs and each of those IDs can have different logging level defined so you can say okay I've connected now to the to the device I want the syslog to be only and show show me only errors from syslog but this application that I'm running that's specifically made to be dumping a lots of information into the log I want debug level information from that so the client that is connecting to the to the dlt demon can actually control which messages at which debugging level it will receive and then you can you can filter down what what you see by the application ID and context ID easily you can log not only textual data but also binary data because the log messages are defined as a set of variables which with types so you can actually efficiently transfer binary data out of out of the device it's time stamped on the device it's time stamped with uptime and a dlt receiver timestamps when it received every message and as this is an automotive thing it's not about just the devices that can actually run linux or complex hardware or software it's also for smaller devices so smaller embedded chips that do not run a full operating system can still produce dlt logs and there exist specific hardware devices intended to be connected to the like like a car network and receive dlt logs from dozens of devices at the same time and writing it all down onto like his and his dcard so you can use such a logger on a drive and collect all the information from all the devices with synchronized timestamps the logs like if you're transmitting a binary information from the from the device you can have the binary information in the logs and that is useful when you want to get core dumps for example and that is one of the things that is quite used so when your application crashes you get a core dump and sometimes the device does not really want to store a couple hundred megabytes of a core dump that will be disruptive so what the tooling around the dlt does it takes this core dump and transmits it through the log and erases it from the file system so it doesn't burden the device anymore and a client can yeah can also inject lines into into the log which is used in testing for example to indicate through this log in a time-synchronized manner when a test a particular test case starts and stops which is then useful in analysis we use the dlt logging a lot on our devices so we can often see the cases where there's thousands of messages going out every second and a test run can produce gigabytes of logs naturally when you have thousands of messages per second and gigabytes of logs you're not going to read it as a human being you need some tooling to help you with that and thankfully there's some tooling provided on a basic level if you actually want to just read stuff and actually see what's in the log there is the tool called dlt viewer that is provided by by jennivy it's a GUI tool to to open the log file to see what what are the messages to filter the messages by all kinds of criteria and look into the actual like binary data for example but yeah if you have if you have if you want to have more automation out of that that is not going to help you much so for that inside bmw we've written a tool that we call dltilize it's a framework for analyzing dlt traces it's a python-based tool where you feed a stream of dlt log messages through a bunch of plugins and each plugin is responsible for extracting whatever useful information it can find in this dlt log there is it uses libdlt which is which comes from jennivy and a python adapter for this libdlt which was also written in bmw to parse the logs in c and have the business logic in python so we can have the best of both worlds i can say right now about the licenses the jennivy tooling both jennivy tooling and the bmw tooling for this is released under mozilla public license version 2 plus and that's how it's going to be published in debian as well so the plugins initially declare what application id context id they're interested in so that allows efficient filtering so not every plugin gets fed like tens of millions of messages and they get informed when the device starts up when it shuts down when the testing cycle actually ends so that they can report stuff when it's done and when we do analysis the plugins take some information from the logs and produce output files like ccv files that i'll show in an example next and when we do testing using using this tooling we have a way of the for the for the plugins to produce test results in the j unit format so that you can report pass fail you can say from a plugin why exactly you failed and that way you can actually write tests on this basis so that again scales up to to the scales that we're working on so thousands of messages per second and with hundreds of plugins running and because it's using using a separate file to store intermediate data it doesn't it doesn't block the logging process so here's an example the code is there'll be a second slide with the code the idea of this plugin is to take information about memory usage so we're assuming that on a device there is some kind of tool that periodically logs into the into dlt the system memory information so mem available mem total buffers cache shared and it's just just logging that into into the dlt log like once a second for example so the plugin here is for in its initialization is just opening a csv file that will be the output for the plugin and sets all the all the class variables to to their initial values then when devices started we see in the in the dlt log that device has been brought up we save the lifecycle id so by lifecycle we mean okay how many times this this device has been booted up so that's the lifecycle within test lifecycle zero the first time it's up and then call is the the main function which gets called every time there is a message that is map matching the filter that we have defined before as app id and context id so this receives the actual message saying oh there's so much memory in this in the system so we take that we parse it out split by values and just write into the csv file whatever values we we found so this this particular plugin doesn't doesn't do any processing or analysis on the actual data it's just taking the data and dumping it but you can see how you can do more calculations in this in this context and at the end the report just just closes the csv file and it does a test so we've been during this the execution we've been storing minimal memory available so what's the what's the minimum during the whole test run the memory available has been and if that is less than a gigabyte then we add a failure test case to to to the result if it's if it's fine then then it's a success by default and and there is an error message that will show up if you if you run this kind of thing in a Jenkins you'll you'll have a result html xml file so it will show up as a test failure and on the right you can see the output of this of the script so on the top you can see the the basically the csv file that was produced you can have you see the uptime time you can see the values for all the all the memory values and on the bottom I just took it and open office calc and created a graph so we can see that the memory available stabilizes soon after the boot up so that's good we don't have any major memory leaks in that which is kind of the point point of this particular test exercise on top of that what so that was about the DLT lyce tooling about analyzing DLT logs on top of that we want to also release a test run environment so that is a bunch of tooling based on on on on top of knows for executing tests the idea is being that you define a class that controls all the aspects of your hardware device so how do you actually power up your hardware device what kind of modes does it have how do you run a command on this hardware device and then this this tooling around it does a full cycle of testing on a device so it sets up the test environment starts dumping all the relevant log log files it prepares the target so it's actually it switches it to a flashing state flashes the software switches the back to normal running state verifies that it actually came up and make sure that the like the DLT log is started and puts into DLT a message saying setup is done all good let's start tests it gathers tests so we have hundreds of development teams so we we actually need to gather tests to be run on a device from hundreds of folders which also define what type of test every test is with the meta information written on top of the test but sometimes we don't want to fully trust that because we want to have more control over which test is actually considered in build acceptance criteria so we have separate file that is under control of the release team that defines okay these tests are actually build acceptance tests and these are not and okay this test this particular one test it has been flaky lately so we still run it but we do not consider it's failure to be a fatal for a fatal error for the build acceptance criteria and all that kind of filtering and additional information is what is happening inside this XTE framework test collection suite that is added on top on top of those in addition of course there's on the retrying of fail tests if that is useful in some cases and XTE also marks inside the DLT log okay now we've started this test case then you can see what's happening with the device during the test case and here we ended the test case in case the target has been well damaged by the test in some way if there's a crash for example what it does it makes sure that we have fully received the crash dump from the target and then we usually restart the target reboot the target so that we bring it back to a known good state before we execute the next test otherwise you could have like one test doing a crash and then 20 tests right after it failing because they cannot connect to the target because it's down and the logic specific to a target to a device it's encapsulated into into python classes and we use the hierarchy of python classes to define common behavior related devices related variants of devices so we currently are using this in three different projects inside BMW and during the extension from one project using this to three projects using this we have now defined a common core of the XTE that is useful to all of them and we are still we're still in process of reviewing what exactly from that is useful for people outside BMW so it's not fully ready for open source release yet I was hoping it will be ready by this talk but it's it took longer than expected and and when that is done then that will also be released as open source and packaged for debut and for for other people to use there are other things in this this this framework that define how it's how it's how it's working so this objects that are not that are like a global level a single ton object so the target is one of those so you can have a you can have an object and code inside of it that encapsulates all the communication with the target encapsulates all the information about the target such as which exactly mode it has been booted on last like take a screenshot so you can have a function on that level to have to do that and there is a DOT monitor thing that it's a it's a class that is running in a separate thread that is looking through all the DOT messages coming from the target and manipulating state machines on the test host side to to indicate to tests in what state the target is so is it like oh everything is fine all the systems are running all the services have been started or we are actually recovering from the crash right now and every test can do do things like okay we we've started monitoring of the of the of the DOT log then we do some kind of action and instead of trying to figure out if the reaction of the target is good from like screenshots we can actually make the software on the target log into DOT information about its current state and then we can in the test see if that state has been reached from the DOT log so that makes the automation of the testing much easier and in parallel to test execution what xt also does is it runs the DOT lice on the DOT file so that you get the live analysis of whatever logs you get from the target so as soon as the execution is over you already have the results from that as well here is a example target setup that we have so we have a target device and a test host the idea is that target device is fully isolated from the rest of the world it only communicates with the test host through ethernet can serial whatever whatever else is there and then the test host decides if he wants to forward whatever request the device does so in that way you can if your device for example connects to a service to get information about updates you can intercept that from tests and provide fake fake update information to test that that kind of functionality and for power control we use Arduino with very simple software inside of it power relay so we can just switch off power of the device we also use that that particular setup with the Arduino and the relay to trigger buttons on devices to connect and disconnect usb devices so we just have a like a usb plug which has a couple of things soldered onto it soldered onto it so if we bridge them then usb devices live and we disconnect them it's offline it's disconnected so that way we can control these things from from software and then this whole this whole structure runs and it produces you a set of reports of what is the what is the state of the of the of the target right now what i what i what i wanted to do with this is to get as much interesting value from inside the debian inside inside the bmw into debian into the wider open source community like like in every company there is naturally a bunch of a bunch of code a bunch of software that is like really specific to what we do and some of it that people are not even proud of being written but it had to be written but then there are things that are useful and that will also be useful to others i think this is one of them and i think there are more of them hidden inside bmw hidden inside other companies and so i've gotten the permissions to release this this software with the idea that maybe other other companies other people will be will be using this and if it's good and useful maybe they'll also start contributing back some changes to it and if that that particular avenue is successful then it will be far easier for me to convince management to release other parts of tooling that might be useful to others therefore the my request to you is just to look at this to figure out maybe it's something that could be useful for you maybe not only in embedded devices you could try seeing if this could be useful to test like cloud server instances or other types of systems and see how that goes as i do not have a full running setup right now to to to show you as a hardware setup i will be writing a set of blog posts later on also aggregated on planet debian to show exactly how how to create an example uh test setup using something like a raspberry pi as a target device and that will that will be the thing that that will show you more step by step how to use this thing eventually my goal is to have this fully packaged in debian so that you could just take a debian system install a couple of packages and have a an environment to be able to test your hardware so that is it and any questions now microphones in front of that yep just keep going hi thanks for your talk i'm curious to get about um you uh maintain the packages and uh i mean you freeze the package and then try to release the packages right if there is a something like security issue and then you have to update your package so uh how do you define your testing coverage to make sure it is not too big or too small yeah thank you yeah it's a it's a big and philosophical problem on uh so when you do especially it's like security release how how much do you test and what's what's happening with the with the packages there um basically we when we do a release of software it goes through all the testing that that we have and both automated and manual testing in actual testing cars so that does mean that uh releasing something as security release would would take a while if there is a critical problem there are ways to disable functionality in cars really fast so there's coding on the cars that determines what functions are enabled at disabled and that can be rolled out really fast to disable a function and then later on maybe in a couple of weeks you can release a an update that makes that function fixed and then re-enables it so there are there are ways to do that but it also depends on the on the on the security on the software update functionality actually working fine which has not been the case before in the cars has not been real focus for the cars okay thank you more questions no i'll then show you something so here is a it's this way here's a duty viewer it's currently connected to my laptop so my laptop is well it's a buntu but yeah kind of kind of the bianish but the idea is that we have a dlt demon running on this laptop that gets some information from it with the default tools that are included and is logging a bunch of information into the dlt log um we have and says journal we actually have information about from from system from system log from system d and so every message that gets logged there will also be transmitted to the to the to the dlt log there's a that's a separate plugin which with a bunch of configuration to it so what information you're putting into there under what app id context id and all that that kind of settings and here as proc there's a tool called uh dpl kpi so key performance indicators and it's logging all kinds of information from proc about uh load of the system what processes are running how much cpu time and memory have these processes been using in the last second so that's dumping a lot of performance related information into the dlt log which you can then again pick up and analyze with uh with dlt lies to figure out what's what's happening on your device during test and currently in dlt lies release there are no plugins to analyze these particular bits of information because we internally have a different tool that does similar things but uh that's one of the things that i also intend to write so that you can have a useful analysis of this default process logging as well to figure out that's it i have a general question so suppose i want to buy a car with a like a free software operating system and which one would be the best in the market a car system that can be hackable unfortunately i do not think that will be ever a thing yes the problem is that in many jurisdictions it's actually prohibited by law to sell cars to customers with uh modifiable software uh it's required uh that the software on the car that the whole components of the car including the software are actually pass uh pass verification pass certification by the government authorities and the certification parts of certification require that no uncertified software can be installed so yes unfortunately that's the the way it's it's forced the idea is so that the cars on the road do not cause trouble for other people on the same roads so it's not about you being harmed by the car that you've hacked what about other people being harmed by the car you've hacked for yourself yeah we are basically out of time unfortunately so uh i'll be i'll be around to answer all your questions in the in the hallways and so far thank you