 So this is the reason why I prepared a little bit of the story in the background, how images are acquired, how images are read, how do you have it, tell me where it is. Better? All right. It's only for the, all right, yeah. So I will also talk about iterative reconstruction because this is very performance demanding. So in earlier days it was, yeah, it was possible to do it in MATLAB and I can tell you when I started 12 years ago what the performance of MATLAB was. But when CT started there were really challenges in general. Then I will also talk about challenges with the .NET framework or also with C sharp. And for performance reason we also use technologies like the math kernel library, CUDA, and we are also investigating or checking what Vulkan can do because CUDA has a little bit of problem in general. So historical background, I don't want to go through the whole table. It all started 1895 with the discovery of X-rays. It was a pretty interesting thing because this finally led very fast to technical inventions like what they scanned is, they scanned your feet to just measure your shoes. And they really get, yeah, rid of this technology very early because a lot of X-rays because we are talking about X-ray radiation. This is causing trouble of course if you simply measure your legs or your feet just to measure your shoe size. Another approach or another invention discovery was Johann Radon, an Austrian researcher who discovered the Radon inversion formula. I don't know if you have something, if you know about this. We will see in the technology or the mathematical backgrounds of back projection. So in the end this is pretty simple. He says every ray that comes from every angle, if you inverse this process you can recover the original image. So that's what inside your body you scan with X-rays. You see everything that is in the X-rays projected on a single line as you know from a chest X-ray from your doctor. And if you invert this from many projections you can recover your image. Yeah, 1936, this was the time where there were a lot of colors, there were the hippies and there was EMI and the Beatles. And obviously they earned so much money with the Beatles records that they were able to fund an electrical engineer, Sir Gaud Frahansfield, they were able to fund him to doing his research here just on his EMI deck, this was a simple computer at that time. And he also did some research on the first CT system. So this is the first CT system with wood and a little bit of steel and an X-ray source and some detector. And this is a reconstructed image, so this is a brain. So some of you can tell me if you can see anything of this and you can do your diagnosis on it. Everything is here maybe, but only those ones who are aware of these images know a little bit better what's inside the images. Yeah, 1974, there were of course, it's Siemens, the first publicly available CT scanner. It's only for the head, the acquisition time of the head was seven minutes, I can show you later numbers, how far we are now and this is how it looks like from the inside. This is the X-ray source. This is how a system looks now, here you can put in a whole body. And if you open up the system now, you have the X-ray source of course, you have a detector and these are the most expensive components of the system, that's one of the biggest problems. We're talking about 50 to 80,000 euros for these two systems. Taking it from the other side, you need a high voltage, a cooling system, of course you need electronics and a power transmission. And from 1974, you see these images here, to 1996, you see this image. So of course these are not the same patients, but you can directly see that of course the image quality has improved and so we are 2018 and we are still at the same image quality. We have a bit better image quality but we stayed with a 512 by 512 matrix size and here we are with 80 to 80 matrix size. It's a matter of memory consumption because if you double the size, it has to be stored somewhere and we are currently having real trouble storing all the images that are generated from an X-ray system. So from 1972, we have an acquisition time of the 7 minutes, so 300 seconds and the data that is required for one rotation, so you have to imagine what happens with the CT scan. So you start a rotation, you rotate around the patient for one image slice and if you have a so-called spiral, so if you want to scan from head to toe, you have to scan with a spiral scan. So here is no number because at that time there was no spiral scan available, so you had to rotate, you had to shift, you had to rotate to shift and so on and so forth from the top to the bottom. In 2015, we rotate at 0.25 milliseconds, we produce data per rotation 270 megabytes and for a full scan 1 gigabyte and the largest file I have seen so far is about 30 gigabytes and with newer scanners we are coming up to 100 gigabytes. So this is something you have to process and one of the biggest problems is that if you are doing a CT scan, you sometimes have emergency cases and you don't want to wait 20 hours until the images are finished, so what is the demand here is that you have something like 50 images reconstructed per second, mostly 100 images per second out of the 30 gigabytes data set. Here is one question, who knows why the 300 seconds could be reduced to the 0.2 seconds, I can show it. The first scanners were a translation rotational system, so you had one X-ray source, one pencil beam, you had to shift the pencil beam, then you rotate, then there is another shift, then there is another shift and this has evolved to a so-called cone beam system where you could rotate and you don't have to shift and translate. The only problem is that there were cables, so with one rotation you had to go back, rotate again and this is where the images with the power transmission comes in, this is a so-called slip ring. With the slip ring in Germany it's called Schleifring, so it's some sort of a slip where brushes are connected to this metal bearing here and in that sense you can rotate forever and you have a continuous power transmission and you don't have to go back and forth, so you rotate in this so-called spiral and this is also what enabled the spiral scan, so you can start, you have your spiral and scan over the patient. Yeah, I don't want to talk about all these numbers, there is of course the matrix size here, so 2018 we shifted to 1024 matrix size. If you go to let's say regular CT and not CT, regular chest x-ray scans, they have a matrix size of 4096 to 4096, so much larger. The images, the number of images we produce are around 2,000 images per patient, something like this. So I talked about the image acquisition, this is what I call the pencil beam, so here's the x-ray source and I talk about this very primitive example because it's easier to explain than the other examples. You scan over the patient, here inside there is the brain and then there is a so-called attenuation, so every tissue, bone, liver has a very different attenuation property and this attenuation property is measured here in a single line, so you project a two-dimensional plane onto a one-dimensional line for each angle while you rotate around the patient and this is the so-called synogram, so here you see the process, a little bit old animation, but in the end what you get is projections, so there is a detector that has around thousand channels in one direction and let's say 64 to 128 rows, so this is one projection from one angle, then there is another projection and if you finish this up, this is the so-called synogram, in the end it's an x-ray from many projections, regularly x-ray from many projections and now we're coming to this Radon inversion formula with this Austrian scientist, we need to make from this synogram a reconstruction of the image, so in the end what you do is there are two methods, either Fourier based which is normally not used or image back projection, so in the end what you do is the process I have shown, taking a projection from one angle from the next angle is simply inverted mathematically, so you sit on a pixel here and you look at the projections that are contributing to this pixel and you just reverse the process, just mathematical image reconstruction, of course this is time consuming, I have some numbers here but I just show this example, so this is how it works, this is the original object, just a cylinder, this is one projection, this is how a projection in 1D looks like, you do it many times, three projections and then you see finally you have some reconstructed image and now you see that there's something wrong, you have a cone, you have not a cylinder and this is what is called filtered back projection, so something goes wrong in this regular acquisition process and you need to filter this one dimensional projection of a single angle, for the next angle you also need to do it, so there is a so-called filter kernel that you apply to this data, it looks like this, there are many kernels and if you do this then it should go, for many projections in the end for 128 projections for very primitive objects it's pretty easy but if you have a very complex objects like a body it's pretty complex and 128 projections are not enough, we're talking about 4,000 projections for each angle, there is one problem though you have to decide if you have a sharp kernel or if you have a smooth kernel and the kernel controls the image impression, so if you have a sharp kernel you have very fine details, the only problem is you have very high noise and it always depends on your indication you want to see, do you want to see small fissures on bones or do you want to see tumors and this is where you have to decide between the kernels, so that's what the kernel does, it controls sharpness, noise and the edges, these are medical images so we are using mostly phantoms in our department because it's much easier to see the effects not on a very complex human tissue but to have some simulated phantoms where we check what's going on, so this is a regular head phantom where you have this bone here, this is a very complex bone with very fine structures and you also see low attenuating objects and this is some sort of where you want to see or where you see if your algorithms work good for soft tissue, for low attenuating things or for high attenuating things, so and this is one approach how a regular reconstruction works and you see that there are some problems in the reconstruction, there is a so-called beam hardening effect, here you see some sampling effects obviously if you enlarge this you can see in the volume stack from many sites what the result of the reconstruction was and there is another approach and this is iterative reconstruction, iterative reconstruction is a process where you revert the process of reconstruction again, so you do a technical forward projection which means it's a technical process of acquisition, so first of all you have a reconstruction, it's a 3D volume stack, you made some errors with your algorithms because they're not perfect, then you try to do some so-called forward projection, so you invert the process it's a little bit complicated, the process of reconstruction again and then there is some sort of an image, so this is some sort of a loop where you try to with a few cycles you try to recover the image quality and put also some prior knowledge in it because you know at the edges there is always a beam hardening so you put prior knowledge in it just to recover these beam hardening effects and of course you can imagine that this all costs a lot of processing power, for a regular system we have around 700 gigabytes of main memory in the system and about four Volta GPUs that doing this process and as I said 30 gigabytes and 100 images per second is a very highly demanding process on the developers that are doing this, we are in the field of research that's the advantages we don't need to be that fast but still there is a high demand on having images quickly, so I just want to show the effects of iterative reconstruction so this is a regular reconstruction here is some sort of filtering on it here is again the first version of iterative reconstruction and this is something where the full iterative reconstruction was applied and as I said we're talking about x-rays and of course there is a so-called Alara principle as low as reasonably achievable so you should of course you have to apply x-rays to see something but in the end you should only apply as much as you need and if the algorithms are good enough then you can reduce the dose further or you keep the same dose but it can improve image quality too for example detect attenuating regions here in the bladder so this is a thorax scan and abdominal scan this is a pretty large area so this is from here to here somehow and you can also see that there is some noise reduction so you're going from Houndsfield units by the way was named by the inventor of the system it's a unit that was introduced it's a unit less unit so there's nothing it only is some relative value to water and here are also some improvements because you want to see very fine details this is what you what is shown in the regular weighted filter back projection and this is some IR process where you can much better see the line pairs and the small lines of the object also here this is a coronary stand that if you have some occlusion on your heart vessels then you apply metal stands on it to open up the vessel again and it's some sort of a grid in the cage which is inserted into the heart and it's helpful to see what what you what it's inside to see the stand struts there are so-called stand struts because it's possible that they clot again so they get closer again and this is necessary to see inside this stand yeah I would talk about the techniques behind I can only cut pieces it's a pretty large area where I can only show a few details we experience with the net framework and with the C sharp so in general we are talking about volumes so our volumes about two thousand slices let's say 512 by 512 stored in float bite depending on what's inside or integers and you need to store it somehow so this is the it's a small diagram where we have some class which is a volume and we have many types of volume so because we also want to support for example matlab volumes so we need to have some wrappers around these volumes so this is the let's say the data collection that manages the slices the volume and image slices and there is also for the 30 gigabytes the projection data it's somehow identical it's also sorted in slices and you also need to store it but the problem is we don't have the size of 700 gigabytes I told you before in our production system we only have 32 so what we need to do is we need to invent our own memory management other than the regular garbage collector because the garbage collector only cleans up but in our case it's something we need to access to the data more often so that it shouldn't get disposed and we need to sometimes access it sequentially and sometimes randomly it's depending on the task we're working on so what we have is each of these slices and the projections are derived from a so-called mappable object and the background thread is running in the background and continuously checking which object there is a reference counter there is a time counter how much or how often the object was used when it was used last and depending on this these automatic process checks is there enough space free and then dumps data that is not used for a long time just to the hard disk and when we started memory map files were not available there was a problem so we could have used memory map files but the problem is memory map files so we the last two years we have tested it memory map files are currently very slow so there are two things where you can access memory has any of one of you use memory map files okay so just to explain it memory map files is a is some mechanism where the virtual files is or the virtual's memory manager from Windows takes over and says okay you don't need this anymore I just dump it and of course it's stupid to do it ourselves but the memory manager was first of all a little bit slow because of this line here and the other thing is that when we access the data that data normally from the system is stored in the compressed way so it's not only reading it's also un-sipping more or less and doing some data compression data formatting on it so just simply writing and loading is not good enough for us so in the end there were two things why we couldn't use it is first of all because of this line it's very slow obviously no one has ever invested more on it that is one thing and there comes another thing so this is a little bit of a new topic how many of you are using arrays large arrays in your in your environment so do you use a lot of arrays or just I just do it once and then it's over so that's the interesting thing because we were talking about the benchmark.net I did benchmarks and since I started with this about 12 years ago I did a lot of benchmarks and this is the regular addition so I show these array so this is just adding two floats I just want to add two floats this is conventionally done taking the a.length here oh that's wrong for a 32 then if I extract the a.length to just an external variable I get this number I always thought it shouldn't be the case because I thought the chittery is optimized but I was very surprised if I went to pointer arithmetics to just make it unsafe and do a pointer arithmetics I come to this area here the external n doesn't really matter anymore I was also very surprised and of course I tried the span and I know that the span is not very functional or it's not so fast as in that core here the performance dropped so of course I was hoping for net core that this will improve but I was also using the SMID and this was the best performance in that case when I do this here on net core I was very surprised and I hope someone proves me wrong that something is wrong here because all the values were much lower than on this side and as we heard on the other side that dot net core is fast I also read these two blog posts and I was very surprised that I get the wrong numbers here there's another thing what if I want to do it with generics have you ever tried this it does not work C sharp does not support mathematical operations on generics as C plus plus do and this is the ever a lot of thanks a lot of my colleagues came and say why it is it not possible and there are a lot of cases where we had to write a very large math library because in if you're doing area math sometimes you want to have a constant added to array and of course you have commutative operations on arrays it's not always that you have only one function so I think we have around for all these mathematical operations like cosine sine and so on I think around 500 functions and one problem is that generic are too weak so what we did is we wrote our own I L code because I L is able to add I L is obviously type agnostic and is able to add two values and it's if you do it correctly you get the same speed as a regular addition writing it in regular source code that was also one thing if you look at this mathematical expression these are volumes 512 512 thousand and if you want to have these operations this is one operation for example you use for the iterative reconstruction so 0.3 a plus b and so on so here you see what's happening for the a plus b you have one gigabyte of memory then you add another gigabyte of memory because all these expressions add up and in the end for this simple expression I allocated 5 gigabytes more than I needed what we did is we resolved the expression so it's a little bit I was hoping with the Roslin we could do a little bit more just write the expression because when you have these expressions you want to see the real mathematical expression and not the long expression of the function calls so it's helpful for debugging and also helpful for researchers when they look in the code so what we did is you apply here a string and some parser gets into this and recompiles your code and builds only one for loop around these thousand slices and extracts the math again this expression into just one expression call so first of all you have a factor of four of performance gain and only one gigabyte of allocation if you do this and there's one thing but I'm not quite good in this Roslin support right now what would be really helpful would be if you could write expressions and before the real compilation process some parser that we write goes into this and extracts this line and inserts the code directly so that's currently not really possible there's also cast on a race which is a little bit of a problem the same style because there is no numeric type which we can apply on the data and I don't want to go into this I just show the cuter things two minutes all right has anyone of you give scooter all right in C sharp or in C++ yeah the problem is that in in for the C++ or for the for the C sharp environment there is no real access so you have these scooter driver up you can access with windows api calls and of course for the back projection just to give you a few numbers for the back projection I showed here we have around 134 tera operations per image for the back projection so what we did is we used cuter you can write cuter in net as regular cuter code of course in C style or C++ style this is how the cuter code looks like this is here inserted into the solution and then if you simply say safe code there is a generator that generates the PTX code the NVCC which goes over the code that is the real machine code from cuter then there is a designer code that translated translates the code via C lang we use C lang for the AST and it generates a wrapper for C sharp so this is the wrapper it's a very simple it's just accessing a three-dimensional texture and this would be the code that you normally have to write that also is what you normally don't see if you use it in C++ but this is what C++ does for you sorry C++ does for you just writing all these wrappers around your code and then you can execute cuter code simply from C shop so this is it I know it's a lot of things just a few pieces and I would be also interested in the net core performance tests as someone you have better experiences than me thank you