 Hello everyone, my name is Bogdan Sovaran and I would like to thank you all for joining. This session talks about the implementation of the video for Linux driver for the analog devices ADDI 9036 time of flight processor. We will emphasize the additional features that had to be implemented to expose the time of flight camera's full functionality and the changes that had to be made to get the same driver or platform-specific versions of it working on different computing platforms such as Raspberry Pi, NXP IMX8 and VDAC Xavier AGX or the Dragonboard 410C. On today's agenda, I'll briefly present myself then briefly discuss the principles of time of flight technology and how are them applied on ADDI time of flight development kit. After that, introduce the open source user space SDK and further talk about details that made the ADDI time of flight camera special versus a normal RGB camera. After we saw the differences compared to traditional RGB cameras, we will see some implementation details for driver and SDK and in the end, discuss some of the particularities of each supported platform. So let's begin. I'm an embedded software engineer working for analog devices from the beginning of 2019. I'm a member of Analog Devices System Development Group. This group's goal is to improve the customer journey by offering different creative tools, models, hardware and lots of open source resources. My previous experience was in the automotive field and since I joined ADDI, I worked on IIO drivers for high-speed converters, DRM drivers for one of the ADIs HDMI transmitter and recently on video for Linux driver for time of flight processor. Okay, so what exactly is 3D time of flight? 3D time of flight is an industry term used to describe a type of scannerless radar, light detection and ranging used for depth sensing at ranges typically less than 10 meters from the source. The way it works is that it sends out light in the IR spectrum from one or more laser sources depending on the required range towards the scene of interest. The light reflects off the objects in the scene at different points and returns onto a sensor which for this platform is a charged coupled device or CCD. Time required for the light to be sent out and return is measured and using the known value of the speed of light which is 300 million meters per second, the distance of each particular point in the scene is determined. But making millimeter accurate distance measurements by measuring the light travel time is not practical since it requires very fast sampling rates. For this reason the ADI time of flight technology uses a special technique to measure the energy of the light returning to the sensor in two different time windows of non-duration and the distance is computing as being the ratio of the energy in the first time window to the total energy in both time windows multiplied by the distance the light would travel in the sampling time window. Compared to traditional 2D cameras which generate a color image of the scene, a 3D camera outputs a depth map where each pixel represents the distance of the corresponding point in the scene to the camera. The picture on the left part of the slide shows how a depth map looks like where the warm color indicates smaller distances while the cold color indicates further distances. A 3D camera also outputs a black and white IR image. Synchronized with the depth image with is actually a representation of the total received light energy. It is this combination of 3D depth map and 2D IR image that enables enhanced object detection, classification and tracking in the metric space. Analog devices are providing a full time of flight development solution consisting of a time of flight camera kit, a complete software stack as well as additional collateral and application examples. The ADI-96 Time of Flight 1-EBZ Time of Flight camera is based on ADI's ADDI-90360CB signal processor and is capable of measuring distances in the 20 centimeter to 6 meter range at an accuracy of typically less than 2 percent of the measured distance. The camera outputs on a MEP-CSI2 interface, synchronized depth and IR images each with a resolution of 640 by 480 pixels at 30 frames per second. Built on industry standard 96 boards form factor, the camera can connect to various 96 boards processor boards as well as other industry standard development platforms such as Raspberry Pi, Nvidia Jetson or Xavier and NXP IMX8 using the FPC connector also provided on the board. The system has options for MEP, USB and Ethernet connectivity to cover the full product development cycle from evaluation, prototyping and algorithm development on a host computer to full application development on an embedded system. The open source software development kit provided with the platform can be used throughout for a consistent software development experience ensuring software reuse and stability all the way from the initial prototype to the final product. Windows and Linux support are built into the SDK as well as sample code and wrappers for various software frameworks and languages including OpenCV, Python, MATLAB, Open3D and ROS. Its layered implementation allow users with all level of experience to interact, develop and implement what is of interest for each of them. Now let's jump in details of ADI time of light camera and first begin with the MEP data format. It is 12-bit packed RAW with LSB first. Even this format is not uncommon for RGB cameras there is usually used to encode buyer data. In our case the data is either depth or IR grayscale. In order to apply processing algorithms we usually need to unpack the data and extend it to 16-bit per pixel with four padding bits. So in packed format we get 960 bytes per line but that we're transpose in 640 16-bits words after unpacking. This can be a small issue on targets that don't perform unpacking using GPU image signal processor. For those targets the unpacking has to be performed on software in SDK and this comes with increased load on CPU. Second the ADI time of light camera provide three possible output modes. Depending on the requirements of the application or in some cases the capabilities of the embedded platform all or only some of them can be used. So the three modes are depth only IR only and depth plus IR. First two are the simple cases accepted by all the platforms. The last mode can rise some problems on some platforms because of the virtual channel use and especially to the fact that there are two consecutive frame start packets. So this is for depth and this one is for IR two frame start packets. This problem can be solved on some targets that does not support virtual channels by setting both data streams on virtual channel zero. Also because of the way the image lines are presented on MIDI packet structure the SDK needs to perform the interliving when data is extracted from the frame buffer but we'll talk in detail about this a bit further. The last thing that is special to the ADI time of light camera is the fact that the full depth range is not covered with one working mode or at least not within the specified two percent accuracy. So because of this we have three operating modes near medium and far. More clear these modes are optimized for detecting subjects at different distances. The optimization for each mode is performed in the calibration data set used and also in the firmware block run by the ADDI 9036 chip. So for changing the range a reprogramming of the volatile program memory of the chip is required. The firmware and calibration data can be stored on the included EEPROM memory from the camera module but other adapters of the ADI technology opted for storing only calibration on the module EEPROM. We'll see further why this is for importance for the Linux driver. Now we will talk about the video for Linux driver implemented for ADDI 9036. This driver was implemented based on OV5640 driver that was already in upstream kernel. It is modeled as an I2C sensor driver and integrated in video for Linux framework. It exposes a media sub device with a source pad that gets connected to the bridge driver. In the example where is a print screen from Raspberry Pi we can see the media device with this corresponding sub device connected. For supporting the camera particularities presented before a custom IO control had to be implemented together with standard pixel rate and link frequency controls. Two main versions of the driver were proposed and will further be presented. So the initial version was very handy for camera and SDK development because it just exposed to user space all the camera registers. I want to point out again that the ADDI processor chip has only volatile memory for storage of firmware and calibration data. So before any data to be put out on MEP, a programming through I2C is required. All memory is mapped as registers and can be easily accessed with this version of the driver from SDK. This driver did not perform any control over clock outputs. So the camera module has an internal oscillator and does not require external clock. It will only output the MEP data clock when streaming is turned on. As it can be seen, the chip config control was accepting an array of address data to a pulse and directly wrote the data over I2C to ADDI chip. This was of course not acceptable for upstreaming but was very useful as I said for early development of the time of flight ecosystem. So if we take it step by step in a sequence diagram, when the user starts the in show demo application and select the desired operating range, the SDK using the user space access app run through 8024 driver fetch corresponding calibration data and firmware data for selected mode. After data was retrieved, SDK performs some computations based on camera and lens calibration and convert them into registered data. After SDK created an internal homogeneous block of address data pairs, write them through set chip config custom control and true driver directly into ADDI chip memory. After that, stream on is called but in this version of the driver stream on did not have any effect. It was just the debug print. Now the written data is read from video device and processed in the SDK. In the special case of implementation that store only calibration into EEPROM, the SDK loaded the firmware from user space location concatenated it with data computed on read calibration from EEPROM and wrote everything through the same custom control. In version two of the driver, all the steps required for loading the firmware from EEPROM were shifted into the driver. Also, it was added the handling of optional GPIO for reset of the camera using GPIO framework. The power state of the chip is now implemented using one time PM and retrieval of firmware is now implemented using firmware framework. The array type IO control used in the previous version was replaced with an integer type one used for specifying from user space the required operating range for the camera. This control is resized after parsing of the firmware with V4L2 control modify range function. This is the support dynamic number of calibration ranges. So when ADDI 9036 driver is loaded at pro time, firmware is requested through Linux firmware API. If no firmware is available, the probe is failed. Then the firmware is parsed and are filled some dynamically allocated structures that hold data loaded from firmware for each operating range. Later on, when user turned on the image show application and set the desired range is just a matter of sending one integer value through the set operating mode control to the driver. Internally, the driver will write the data from corresponding selected structure to the ADDI chip through I square C. All this is happening at stream on time. So if a new range is desired, all the SDK has to do is to set stream off, update the mode in the driver to the control and request stream on again. Now the driver will reprogram the ADDI chip to require configuration. This approach has a great advantage. Now the driver can be used with any other video for Linux compatible user space application. Of course, if both data streams are enabled, the depth and IR data will be interleaved. But this is a major improvement over the previous implementation, which of the driver, which was SDK dependent. In the last slide about kernel driver, I'll present the structure of the firmware block that has to be stored in root file system or embedded in the kernel. The first eight bits are just a magic check sequence to identify the binary block of the data as firmware for ADDI 936 driver. Next are coming four bytes that store the number of modes that are available in the current firmware file. Both of these two elements compose the header of the firmware file. After that must come exactly modes number sub blocks of data, each starting with four bytes of mode ID, four bytes of size, and pairs of register address and register data, each having exactly two bytes. At each mode change, those pairs will be written one by one on the ADDI 936 chip through I squared C. This simple structure is flexible enough to accommodate any number of modes, but we require to have at least one. Now let's talk a bit about the implementation details of the SDK. The ADDI Time of Flight SDK is a cross-platform library for the analog devices depth cameras based on the ADDI 936 Time of Flight signal processor. It provides scripts for seamless integration of SDK on any host platform by cloning all dependencies, compiling them, and compiling and installing the SDK. For platforms where processing of raw data is required, I mean unpacking, reordering, and shifting, the SDK provides already all is required to have in the end data in buffers for the upper layers in the same format. All other features of the camera that could not be covered by the driver or don't have their place in the driver like temperature reading are handled by the SDK. Also, a set of examples and tools are providing for handling of e-prome calibration data specific for each camera implementation. First, we'll discuss the unpacking required for some platforms that don't perform it in GPU image signal processor. If we look at the first three bytes that are received for each image line on NITI bus, we see that actually there is information only for two pixels in those three bytes. So, what Unpecker is doing, either in SDK or ISP, it is taking first byte and copying it as it is on pixel one. Then taking least significant four bits from third byte and put them on least significant places of pixel one. Next, the second byte is taking as it is and placed on pixel two, and now the most significant four bits are taken from third byte and put on least significant places of pixel two. This operation is performed for all bytes of every image line. It should be performed for both depth and IR samples. All supported platforms are ARM processors and we made use of neon co-processor for loop on-roll and single instruction multiple data operation to speed up this process. In this way, the impact on CPU load is not significant, but some reduced in frame rate can be clearly observed. The next operation that should be performed by SDK is the interleaving in case of enabling of both IR and depth data streams. The only platform that does not require this operation is NVIDIA Xavier. There we don't have this problem because the bridge driver is providing a separate video device for each virtual channel, but for the others it goes like this. SDK is allocating internally two buffers, one for depth and one for IR. Depending on platform after unpacking or directly if unpacking was already performed by image signal processor, we copy one line worth of data to the depth buffer. Increase the pointer, then copy one line of IR data to the IR buffer. Increase the pointer and the process is repeated until both frames are completed. So, majority of platforms use only one video device for providing data buffers got from bridge driver and this the interleaving must be performed. In the last slide allocated to SDK implementation, I'll present a bit that we provided to the user when demo is run. Three windows will pop up to displaying the data and one used for settings. The connection mode can be selected when a remote context is used. Next, the operating range can be selected from one of the three options available. If the mode is not available because it was not included in the firmware, an error will be raised at the driver level when mode selected control will be called and propagated up to the stack to SDK. The display of data can be based on live capture or playback of some previously recorded data. Computation of the center point of the depth image can be activated and in that moment will be visible on the depth window, the distance between camera and the subject for the middle point of the image. Some other processing can be enabled like a small signal removal or IR gamma correction. Also, the current frame rate, laser temperature and analog front end temperature are displayed for monitoring. In the last part of the presentation, I'll talk about each supported platform and its peculiarities. First supported platform in ADI time of a development kit project was Dragonboard 410C. It was based on Qualcomm APQ 8016E. From video perspective, this platform does not provide harder unpacking of raw data so the unpacking must be performed in SDK. It is required that both depth and IR streams to be set on virtual channel zero and in this way both streams will end up in the same frame buffer but interleaved. In order to connect media devices' pads, explicit commands should be executed in user space using media control application. Media devices should be connected to create a video pipeline from camera pad to CSI decoder next to ISP and finally to the video front end module. The camera IO controls are not taken over and exported by the video device so SDK must interact both with video device and camera sub-device. After Dragonboard followed, Dresperi Pi host platform. We are using the Unicam driver which has capability to perform unpacking of raw format in latest version. Again, here like on Dragonboard, we must configure the depth and IR streams on the same virtual channel and that one is zero. And again, as in Dragonboard in same frame buffer and the interleaving is required. The video pipeline is already configured based on device tree data so no explicit configuration must be performed and more over the camera controls are exposed by the video device so the SDK can interact only with the video device for configuring all the camera functionalities. IMx8 and mini platform perform unpacking at cheap level but unfortunately does not implement virtual channels capability and also does not accept the trick from two previous platforms with setting virtual channels on the same ID. If both streams are set on the same virtual channel, the frames are discarded by the pixel parser and some errors will pop up in diagnostic messages if dynamic debugging is enabled. Also, the raw 12 format is supported by hardware but not implemented in driver at least not on the version delivered with kernel 4.14 that we used. One other particularity is that the media devices are not registered because V4L2 device register subdev nodes function is not called by the complete notifier callback inside capture driver and actually that notifier is completely missing. So after we added it the ADDI camera media device was registered and the subdevice was accessible. Nvidia Jetson platform was first one tried from Nvidia. It performs unpacking but the biggest drawback is that it does not support virtual channels and like IMx8 mini it discards the frames if both data streams are configured on the same virtual channel. Video pipeline is configured automatically based on device tree bindings and the driver had to be quite substantially modified compared to previous versions. This happened because Nvidia uses camera common framework for sensor drivers. This is not an upstream framework is something specific to Nvidia. Camera common sensor operations and camera common data structures must be defined and filled in the sensor driver and in this way the communication with video for Linux framework is not direct from sensor driver but through camera common framework. But as one of the advantages the camera common framework take the sensor controls and expose them in video device so interaction with subdevice from SDK is not required. On Nvidia Xavier platforms and here I'm referring both AGX and NX the same driver as for Jetson can be used because Xavier is also using camera common framework. The Xavier platforms support virtual channels and use a separate video device for each virtual channel because of this is the only platform that does not require the interliving as I said before. Now on the SDK side it must communicate this special case and open two video devices and take data synchronously from both. To simplify the device tree and keep only one version of the ADDI 9036 driver for all Nvidia boards on this platform two instances of sensor driver are loaded at different I2C bus addresses but one of them is dummy and does not have any physical camera connected. The last platform to talk about is Toybrick RK3399 Pro based on as its name suggests Rockchip 3399 CPU. This platform used an older 4.4 version of Linux kernel and all the camera framework is based on the older SOC camera framework. So only for this platform we created a SOC camera version of the ADDI 9036 driver. We also had to define in the device tree the additional properties to specify camera name image size field of view and orientation. This information was required by the bridge driver for this Rockchip SOC. Finally this platform does not perform unpacking neither the interliving of data streams in individual video devices so all these operations have to be performed in SDK with a bit of a penalty on CPU load. Now as we reach the end I would like to thank you all for your attention and please if you have any questions or comments feel free to ask them.