 Hi guys, my name is Yapubo, I'm a CEO of my company. I'm a lead explorer and the Microsoft engineer. I currently work with the runners of the lead explorer team. And a few years ago, I started working for a company that was producing Microsoft's cameras. Microsoft's cameras nowadays are actually black and white cameras. So that was the first time I made face the problem of capturing images and I realized I didn't know a lot about this. Because image capture, it snugs us to software development problems. It ties into theoretical knowledges that are not much widespread, there's very specific embosses that are useful just for this part of it. And if you're doing that in a better system, it's even worse because you have dedicated pieces of hardware that usually are black boxes and you don't know how to handle it. So two years ago I would like to add a presentation where I could have a pointer to start looking at that and develop that in this presentation video exactly this statement. That means that you already know about image capture and related things and you will be able to get more for the next hour. So I was talking with the International Mildlife close to Bixels and let's see what the image sensor is after that. So images are transmitted from the sensor to the SOC and the brief introduction about data for links which is the frame for handling the image capture of the system. So let's start by saying that color is not an absolute value. That's color that the sensor actually or physical phenomenon of course but that means that it's hard to manage that. So if this is the electromagnetic fields and we see a time portion of them which is this one, this is the visible light spectrum and not human most pieces perceive color the same way because again that's the reaction of the physical phenomenon. So it turns out that viewers are actually as a trigonometric view so we mostly see three colors which are the blue, green, and red that corresponds to medium, short, and long wavelength and since we're living in a digital world and we want to transmit and deal with colors we need a mathematical moment to correlate the perception of the physical phenomenon to the photographic properties in that figure. So a light beam that we perceive as a single color is actually composed by several components each one with a specific wavelength. The way it is described usually is that if you take each component and everyone with a specific wavelength and you know the value of specific photographic properties like light and power that depends on what your language is you don't want to store the spectral power distribution which is that graph over here. It's interesting because sometimes colors that are perceived as the same color are actually different if you look at the spectral power distribution but we know that let's say a particle has these values in the way of the components of the light beam. So as we see basically three colors the most simple way to find the correlation between the spectral power distribution and the numerical boundary is to sample and correspond to some blue, green, and red the value of the photographic property where it's at. So if we want to describe particle in a mostly unique way we can say that this is after the blue, this is after the green and this is after the red that gives you particle the same goes for all the other colors. So in this way we'll construct a mathematical model that it's nothing more than a tool that describes uniquely with a couple of values color from the perceptional color. We can call that the long, medium, short color space but again that's a tool. So in 1931 a committee, the CIE I think they made an experiment with 18 people that's not a usual number and to find what is color space not the problem of mathematical definition of color space led to the definition of the two main color spaces that are used today which are the RGB color space and the XYZ color space. RGB uses the primary color to define all the other possible colors why XYZ uses numerical components and chromatic dimensions that's because we are accessible to these three colors by the human eye it's even more accessible to change in brightness. So we can define a color as the brightness and to associate with chromances and on top of that all the color spaces have been defined which are known as the SRGD color space which is the one used usually in these bullet droids I don't know the RGB which I'm not sure where it is used in my business thing CY and K you may have seen that for printers anyway it's color space and it's preferable to be used. So on top of color spaces because things are confused here as many other things we have very good colors and coding skills which are based on the same color space but are mathematical tools to move from one color and coding to another So we have the RGB color and coding based on the RGB color space and we have mathematical tools which are usually matrices to move from RGB to let's say YG RGB and we can do the transformation method for every time we want it So the names are confusing because we have a color space which is called SRGD we have a correntory scheme which is called RGB we have YCRCB as another correntory scheme and we can do more space with all of that So we now have a mathematical model that we know we can use to represent colors based on the energy and we can now look into what is the most basic performance of an image which is a pixel But we should ask ourselves if we use three digital bandwidth to represent a single pixel which is the bandwidth required to transmit an image of this size which is not useful in our interpretation It turns out the bandwidth required is quite high So for an image of that size we have four and a half megabits and that's not a good thing because that means that we would need very light and then these masses and things get complicated So it's highly practical to send all three color components for each pixel because we have that new sensor with a lot of points and that increases the production cost required a lot of bandwidth So we need some improvement from that moment It turns out that there is a thing called bias filter which is kind of a simple idea because if we put a light filter on top of each pixel each pixel will suggest a single color channel So if we have an edge like this one the first pixel in the magic will just be green the second will just be red and we have various things that we can use to create the bias filter We, of course, need a way to reconstruct the full pixel band and the full pixel color from the single information and there are bias gates for doing that there was any interpolation Maybe it's quite simple because you're using the information from neighbor pixel to reconstruct the band So the first pixel is red because in the red information you can use green from this one and blue from this one to reconstruct the color here from the other one Run over smart schemes for doing that but this one I missed Anyway, you guys can also So now that we know of how we translate things this is what the sensor integration looks like So we have a sensor that wants to go up to work and this side is simply usually having a communication interface with the src and the data bus that is used for the media interface that is inside the image sensor and it's exactly the same thing so you will have a communication interface here that's usually src and it's used to send messages and program the registers in the sensor The way the registers are programmed influences the way the images are processed we'll see that later We have it here the time in phase integration which is kind of important because if you are transmitting images you have to find a way to tell them the receiving part where the frame starts where the line starts what is the piece on how the clock frequency and those things are called time in the initial part So you actually have here the e-interlay This is the clock of the perceptors with filters on top so it's in the area of filter and the row in this which is called the row matrix goes through the one that's called the unalloc processing part that leaves a transformation on the different color encoding speeds on the one that we currently use in this register Depending on how smart is the sensor you can have advanced features here you can have DSPs for the transformation advanced algorithm like of what matters today of the exposure and the feedback So again this is called the basement rig array that's the it's a grid of what's often perceptors with filters on top and we can ask the sensor to stop you from the row grid to give us a specific order of coding like let's say here it is permutation of RGB with different subsampling we can ask the enus sensor to perform manipulation on the enus so cropping the part of the enus downsizing that using binnings and speaking some lines and giving us back an enus with a lower resolution we can zoom on the sensor asking to be getting us a part of the enus and that's a zoom and again depending on the sensor capabilities we can have more advanced features like let's say 3A, mirroring, flipping should not set you back in the last case so we know how we use this and perform and now we want to know the way that how they are set so here we have two kind of boxes that hold two different tables which is the parallel boxes and the need to define a similar box parallel boxes are the one that is holder, it's cheaper it seems to poke into that with the strokes and what have you you can do that on your desk you don't require a desk holder for doing that of course you would find this here in hobby design, older design because the performance are more compared to what we prefer to do with boxes like this one need to define need to use the standard community as we find a set of boxes like this one there are boxes with a higher range they are used for modern applications with smartphone and other major menus with awesome quality it's much more integrated so it's very hard to make a folder by hand and it's very hard to do of course because you need a filter analyzer but that's the kind of things so parallel boxes they call them BT it's not a standard name but BT defines a lot of things you might also define a bus structure BT601 BT656 different things basically defines the way nature is transmitted using wires for synchronization signals and parallel bus so we have the signals for synchronization these things pass here the train is starting the bus here tells the ally is starting our href that is made with the same thing defines where the ally is made with here we have data that are actually clocked now and that's kind of simple in principle again for older and longer designs from the other side we have what is called BT CSI2 which is much more than the physical protocol BT CSI2 defines a protocol for transmitting images describing what we're sending on the bus and can be made on two different physical layers and defines a set of images for them so this is much more comprehensive as a definition compared to the single part of the bus and again there could be two physical layers that BT CSI2 can use DeFi it has quite high bandwidth per lane he has one clock signal for the wall bus and half to four data C5 BT CSI2 is very interesting because it uses trials that means that in the clock lane it is integrated with the lanes on this so you have trials of clocked in data lanes and it has a much higher bandwidth all of this are differential signals that may be even harder to work with on the token solution again there is a protocol that we use that protocol built on top of that and it's a packet-oriented protocol so you have short packets for synchronization so you have a special code that says here starts the train and we have another special packet code that says here starts the ride these are called short packets for synchronization and we have long packets that transmit data data are described by the packet headers can we transport other information called data type a miracle channel this is very interesting because it allows you to do interleaving on different image stream using the same physical bus so data type describes in the final stage CSI2 specification and defines which kind of data from the packet transport which kind of data in terms of data needs miracle channels instead are just simple numbers that goes to the previous specification about 6 and allows you to interleave different streams on the same physical bus of course you can have data type interleaving or miracle channels interleaving so if you take a simple use case you can think about the sensor that produces the images and along with that produces made data let's say the exposure time using the actual image another detail the sensor interleave them on the physical bus using data type so you would have a long packet of data short packet that says short packet of data uses actual images a long packet of data and the receiving side we know using made data type how to render that, how to split them and present them in different places in the user space if you put that on a virtual channel things can get complicated so I don't want you to make meter of that but the basic idea is that you can have multiple sensors on the same sensor producing more data type more different data that are all multiplexed on the same CSI to bus all of that is engrossing his own frame start frame sequence and his own is described defined by a virtual channel identifier so what happens that you have the same sensor you have before maybe the same sensor can produce what is so preview images usually you have capsule images which are when you actually take a photo it's a very high resolution it's a lot of data when you are using a mobile phone from the viewfinder it won't have to be snappy it won't have to be reactive and that's usually a lower resolution data also you can have things like that all of that can be interlinked on the same must specific frame start frame end things can get very complicated the good thing here is that the usual you are going to see inside using the virtual channel again you can split data information in different memory information to present to users as different images so it's bad but not too bad so so far we have been talking about images we have not been talking about software and we have not been talking about links and when you are dealing with image capture usually the frame order of choice is of course the final set of interfaces from the user space and the driver-to-driver communication for capturing data for images so that's the simple possible architecture you are going to have with virtual free the data for Linux infrastructure so starting from the army and what we had before single sensor and receiving size and that's the most important sensor on top of that you will be on that most specific data this is the receiving side and the sensor driver data the sensors are using as well as the data on top of that you have what is from the virtual in-store that has hulker condons you can use for your driver to do things of art sharing with your drivers and the files of what the user space is in case that you use the control they won't capture the data so when you start you won't capture an image from that using data for Linux interfaces the first thing is that you would use the most basic interface of this data device and you open that and you start calling data control from that data control that I usually initially you should grab a sensor to know which kind of data you need if that space works for you and what you are expecting and you can set a specific format on the platform you have to have the image size you have to be sure of what the image size is and if the control stream is like the frame the exposure time of the sensor using a set of IUCD elements is defined by the data there is a huge problem we have to make big data on the bus so we are not memory for doing that usually we don't require that you go to any memory that is DNA capable because you don't want to see to be involved in transferring it from the receiver buffer to the main system memory you like to have DNA capable memory that is DNA capable memory that is new or even tedious because otherwise you will have a DNA engine that supports scattering and assembling which is not that common you would like to have memory that is accessible by the CPU and the device because one is writing and the other one is reading and the other one is coordinating and possibly if you are writing a high performance application you don't want to have memory to have user space involved in moving memory from one side to the other so you want to share memory between different systems so there are three possible memory adaptation models the first one is that the user space allocates memory and passes it out to the driver this is relatively simple maybe because it is a very ultimate assumption of memory where a user space allocates memory, it actually is so you can have memory which is written between different locations and then you go back to the room that you are scheduling it or you can have that carrier driver allocating memory for the user space memory will not be on top of that or if you want to have a very performant application you can use the VMA button structure to share memory between different terrorist systems so in case of memory allocation simply you issue a request box for your control then it goes down to the reader for your core that is a part that is called VMA that is used in the VMA for your VMA after that user space simply has pointers that you can map and access that memory from the user space otherwise you can use VMA for VMA and another subsystem for VMA for VMAs that subsystem will act like the memory explorer that means that it can export memory as VMA or that I am an intern so you can see that and you issue an icon so here the number I used was control here and that is simply provided for the reader for your core in this case the reader for your core is not in this case the reader for your core is not going to allocate memory for you in this case the point in the moment is that you can use another subsystem in the random example so after that you guys are using this what is called a screening pressure screen for one system for another using vehicles you don't have the user space for what we map so all platform drivers have a queue or available buffer location and you simply queue a buffer to your driver and that makes it point to the memory location and it may locate it you do that for all available buffer and you actually start screening start screening means that you send a message to the sensor and that says ok you can start producing data and they then will flow in this direction of course so when you do the start screening thing you may not to our point you capture the data you have the dna engine on your receiver to the buffer that we have received user space here wait using this label for data to be available after a while you will have the first frame that you capture you will receive an IRQ here that says 8 you have data available here so this same thing about user space you will have the memory here and you can produce let's say an atomic for VRAM but you can update the VRAM to display it in the provided mine size at the middle side so you can repeat that you can go on continue that until you stop screening and you simply have to realize that you can capture images for VR and display them without involving user space ok so basically we have seen a rough batch from Zyber that control the receiving part of the SoC and the user interface here we have seen which are their response on the internet faster images from internal of the system memory we have said it can transform images perform transformation of images the hardware that's taken with it for doing that we will see that later and even then the user space is behind it and not part of the frame and allows the user space to go over to the duration as well we have sensor virus that is standing in this directly and controls the sensor to our user to actually have seen that it's pretty interesting here that in the sense of the user do not like you to control any second what happens on the sensor so you are seeing the long distance of them and you seem to be driving them to the bus hoping that everything works including one of them in time not breaking anything so it's kind of a painful procedure but that way it is almost reduced another interesting thing is that we are talking about the internet so we are talking about usually our processor and we are talking about the device 3 so you have seen the pre-use tool the environment gave it an address picture of what the device 3 gives the device 3 is nothing more than a text on the script on the system that gets combined with the volume of the volume and the limit turn on dust start parsing that instantiating driver for each node in the tree that you need that's the encounter each node has parameters that are back to the driver and the driver is back to the digital surface and that's the final name that you can drive in the device 3 the video device is a very specific device 3 bindings which are described in this directory there is a video interface not the HD file described the general bindings the mobile interfaces and described most importantly how do they connect to each other so this is a very simple representation of a possible device 3 for an inter-pepture device so you would have a set of ports each port effectively corresponds to a physical input in your rest of the when you are receiving signs and each port has an endpoint which is described as a specific input through a set of endpoint property and that this abstraction that is from the remote endpoint points to the p-handle which is nothing but a reference to another device 3 node of what is instead a sensor driver let's say so in this way you are describing the physical connection in the video device and you are handling the sensor so we said that there are endpoint properties they are must specific usually so you have a set of properties that are specific to power must that says you what is the polarity of the state because you have to know that how they like to get an assay how many data lines you have what is the polarity of the active end of the base of the protocol things like that and you have specific properties of CSR that describe the frequency of the data lines how are the data lines organized and things like that and it is very interesting that there is a frame before ending those kind of problems is that it is a B4L2AC framework and B4L2F3 so this is a typical probe sequence of a video sensor driver and you do the usual things you train with emery you request ingrads you deal with power management and this is very bad for sleeping but there is a problem but there is a problem that is the problem that DT parsing part during the last three parsing you basically walk for your endpoints and collect the reference and represent the sensor your SSG is connected to and you use those piano to register a notifier that weighs for those sensors to be available to be probed it is a set to be 18 because there is no guarantees that your platform drivers from before the sensor driver so they can get confused and you use that framework register notifiers that weighs for all of that to be available and the one they are they can start to really function that and actually use that so this is the very most simple use case you can add this is just a sensor, a receiving sign and the last thing else that they data write them to emery user space, patches them and do whatever you want this is not the case with today modern SOC because modern SOC is very specific blocks for performing image transformation and performing images manipulation in a way that is way more efficient than modern image sensor so the view of the device without abstraction is not enough for that because we have single port user space where you are meant to control all the SOC components that can do image transformation think that it is limited and so there is a new set of API it is called the medieval query API that allows you to do that medieval query API described the other as a set of entities each one with specific properties and a set of same items and so on that can be connected in order to perform an image processing pipeline that happens on your SOC so this is more likely what modern SOC looks like the other part here is similar to what we had before so you have a sensor and you have a receiving side but not your SOC you don't just have your receiving part you also have what could be a resizer a pixel converter pixel converter whatever your SOC provides they can be back in that speed and of course you can have different parts because all of them depends on the system memory set to the resizer again that's very much what you see of course that needs an obstruction of user state that is the abstraction provided by the medieval query API where each component is described by an entity and entities can be linked in this way to perform image transformation to realize image transformation pipeline in this one we supply it to the artist setting the order in the program which was performing image transformation as we may not capture it at the same time you may have a very simple use case where you don't want to use all the things here so the medieval query API allows you to make them suffer the same size in the main memory when you perform a zero transformation at all images so it's interesting because this complexity here this represent implies more complexity over there of course because now we have to control the format and the sizes and other parameters of vision of the pipeline so this requires a completely new set of API that is called a stumbend API compared to the device node API so this is an abstraction that is painful to work in very powerful but it requires user space to be aware of each single component which formatted the supports which were tested to make off the image of these entities and how to link them but you can have very powerful things you can maybe add steps on this kind of data this is why you need to work with this resolution and you want to pass it to the other side because you want it smaller you want to convert it maybe you want to be well or at least exhale another and you want to format it to a specific way to do that you use the stumbend API and you're going to reach entity here that has a new state of representation that will read the stumbend device and on the read the stumbend device you set format on each source and sync button in a way that leads out to the pixel form to the size form you're not going to receive data in this form but on talent the same goes for resizer let's say you're going to receive images from this side you're going to output images from this side so the other result is that you have configurable pipelines that exploit the hardware field and you're going to succeed to perform image transformation and you're going to do that to user base so of course this will buy more communication because system usually boots in a way that is not usually because of the pipeline it's not sometimes not available and you have to create that so you now just move the system and start executing this you have to have user space utility that controls that for a second and in the middle of the device API and in the middle of the device some devices API share a lot of things but not all of them so it's kind of hard for user space to control that because you need to do the two and maybe be careful doing that and sometimes you just actually do not care about doing isolation because let's say you're at a single user space and you want us to receive images so you have to create the thing to completely escape the pipeline and all this will pass away sometimes it won't justify it so this is I think probably now it's here so I have your question now because you have plenty of time here still the question is that how do you go try to be a fill-in so we have the openmux openmux, if I'm not wrong you just start on the crows working room then there's a lot of different things and I'm not an expert on that at all my understanding is that openmux, if I say that are more there's more faith in this my understanding is that it does something about being a fill-in user because it allows you to do more complex pipeline things like translation of users to the network I cannot tell because I don't know what's going on being a fill-in user in the early user space at BI while openmux allows you to create much more complex pipelines of course this is the Astana and Linux system from my understanding, but also it may not work so I guess there's no doubt it does not receive that many support for the main appearance okay so the question is that are those, I think they're for the medieval world at BI are you using modern SOCs and Android systems, like in models so Android isn't really this if you're asking about general model as a CLCS of course if you're asking me specifically about Android Android that it's a weird list because everything, you see productions that are making chips for Android phones I think that there is a lot of value in doing the cargo side but I don't do that much to do that in that space so you see very real instruction to move this thing from very low space to user space like drive on that just take loads on that web system and fly them to the hardware you don't know what we're doing and those are coming from user space so you do see weird APIs that do not match exactly what they need for this that's a bigger for you you tend to be in order to do that you have nothing to do you have good seats around the open access system doing that but yes, a modern SOC of course, Android is around in these things but I hope that in the future Android implementation from SOC that you fully need to be able to do that well so, Brad what do you think about software that operates in user space so the question is what about OpenCV that runs in user space and maybe if I want some feature of the SOC so I know very few things about OpenCV OpenCV I know very few things about OpenCV my understanding is that OpenCV is a consumer of this kind of API and they may have some let's say media controller support memory devices it will first support devices so I think that OpenCV as an implementation of using media controller to start the feature of the SOC for user space memory to memory transformation that's what you even have what is called the hardware generation so I cannot tell what is the status right now with the support of OpenCV let's hope that they do a couple of questions well, on the hardware side yes of course because we tend to look single different streams with different I try to go back to the slides here yes quite a long time so here maybe I'm not being on that but those streams here are completely different streams so one it's may have a very a lower solution but you're using that for real finder while you touch the big picture of things you want to add media to the hardware solution so making the interleaf without I mean the fine by this ETCXI to specification but there is no connection what kind of data you have here and which kind of data you have here so you're feeling there is a different resolution different image sizes different data that is this is images this is text this is not a point so text you are not tied to specific images or resolution that's why you don't need to see it that's the data from current space yes but if you are from this streamer you already present them at the user space so yes the DT the maker part it's usually exactly for that so the receiving side knows how to speak data using the maker part so they know how to do that you can have two different video device one in the user space one gives you unity and the other gives you data data then there will be a different type of problem that is how you did that with SIG but that's another problem that's a different thing but yes they are even exactly for this video the screening is different we don't know and then present the user space from the user perspective anyone else? Mark again so that's an example with the DNA mark OpenGL I don't know that much about Neera so I think OpenGL at least here upon this one this is the Neera frame Neera engineering manager that implements the DNA mark so if your driver complies with the DNA mark so it's part of the Neera and you can ask them to explore memory as a DNA mark descriptor I'm not sure if OpenGL is involved in that I think it's a layer top in top of this so I cannot argue it yes there are many platforms to that I think of course Neera managers associate that it's only in the world that Neera managers have a lot of communication I think Neera 3 has ISP like that if I'm not wrong please and of course there are IMX as an implementation for that well again all drivers in Neera I think that would be great for them and most of them if not whole support media control I cannot tell why they are very violent when they solve them maybe they are different for all of them someone asked about OpenMax they do have support media for links to add biaries drivers that run microcontrollers and you can control them with the set of RPC and you need biaries to run that once on my controller and you can add the RPC for that but that's I don't know why something like that but yes it's right for them they are doing it out there other question so I would like to thank you my colleagues I have colleagues and people who have been working here last year because I started to know nothing about that and to be able to know people that have been patient without being able to understand that so thanks for that and thank you all for being here