 Okay, so welcome to my talk, unfortunately we had some hiccups with the equipment, so apparently the slides will not be on the live stream now, but you can download them on the FOSTA website, so I'm sorry for that. Welcome to my presentation. It's called an optimized GFDEM software implementation for load agency. First, I want to have a few words on who I am, where I work. My name is Johannes Dimmel and I work with the department of communications engineering at the University of Bremen in Bremen, Germany. The German abbreviation is actually ANT. And then there I work with Karsten Bockermann and professor there is Armin Dekosi. Our research group is mainly focused on wireless communications, so physical layer or MAC layer and we do some signal processing in general too. So why do we work on GFDEM? If you look at state-of-the-art technologies for wireless communications, they're usually optimized for high data rates like LTE, but they also have a quite high latency and also they have low reliability. If you think of LTE again, they usually have a target frame error rate of 1 to the power of minus 1, so 10% packet loss. Well, also it's not the domestic communications and mostly you have FPGAs or everything is implemented in ASIC and pretty much fixed. We work in a realm where we are looking at industry 4.0 applications. There we have a whole different setup if you think of communications. We usually require a latency of less than 1 millisecond. We have a very high reliability requirement. It's debatable if this figure is really useful, but otherwise we need to have a more reliable communication system and also we know that this thing works in a deterministic way. So if you look at this example, those slides here, they just move around and they just periodically send up updates where they are, how fast they move, et cetera, et cetera, and they do that periodically and they are all connected to the system and they all expect an update every so and so often. And if they don't get an update every so and so often, the whole system will shut down. And, well, that translates to your whole production will just shut down and that would be bad for your company. So you want to have a reliable communication system. And in the future we want to have more flexibility and we want to have a software implementation that is parameterizable. Why do we want to have that? Well, this is just one example, but systems might look totally different for different applications. So we want to have as much flexibility as we can have. So now what we're working on, we want to have a new wireless communication system and that is a project called HiFlex. It's supported by the Federal Ministry of Education Research in Germany and it's all on this whole conglomerate of industrial radio. That's where I work. A little outline of my presentation here. I'll first start with a more concise introduction of what I do. Then I will introduce Generized Frequency Division Multiplexing or GFDM as you've already heard in the title. Then I will talk about our low latency SDR implementation and finally there will be a conclusion. So let's start with the introduction. If we have a look at the system model for communication system on a file layer we usually have a source, we have some kind of forward error correction and today I'm going to talk about the waveform. So that's the part I'm going to focus on today. And our requirements here are that we have it more flexible than before that we want to have it robust and we need to have favorable consistent properties. Well, so a little bit further, I already said that we want a latency of less than 1 millisecond. So if we look at this system now, we have a device that needs to send out status and receive an update within millisecond and in between there's a communication system. So we'll have a channel here. We can control its properties by setting certain filter properties etc. and the frequency by then the end while we have to deal with it. But what we can completely control is our signal processing here, here and here. And this is the one we want to have as an SDR implementation and then we want to measure its latency so we can have first feeling of how good it will perform and what we have to expect from that implementation. And then finally we want to answer the question can we achieve low latency? So now to dive deeper into GFDM I will introduce that. Well back in the day everyone used single carrier transmission so you had a certain bandwidth and you would just divide it over time into time slots. That's what we depicted here. So you would transmit end symbols, complex symbols in digital communications mostly and you would send them out one after the other. As we increase data rate that kind of have involved a lot of problems like multipart propagation and we needed more and more complex equalizers. So along came OVDM which helps us solve a lot of problems here and also we got a lot more flexibility. Now instead of dividing our resources in time we just divide them in frequency. So we have low bandwidth subcarriers still we can transmit end complex symbols at a time. GFDM wants to be more flexible here. So we can have flexible scheme where we divide our resources time and frequency into subcarriers here and time slots that's the direction. Okay still we have end complex symbols that we can transmit on our end data symbols but now it's more flexible. So how does the GFDM work just a quick introduction? First we assume we have complex symbols, these d's and we want to map them to certain resources. So we go here to our resources and we just map our d's to these resources and well we save that in a vector of size n of course and every single one of these elements in the vector now corresponds to one resource element here in our resource grid. If you for some reason don't want to use all the resources we can just set them to zero so it would look like we just don't transmit on that frequency. The next step will be to do modulation. Well if you'll read through the papers you'll see a modulation matrix A and you just multiply it by D and bam you have your transmit signal. Well how does A look like? For every resource it contains a circularly shifted and modulated replica of a prototype filter. So the prototype filter does actually shape our signal just like we would do with for example a single character transmission where we have a root rise cosine filter and now we just shift it to the frequency and the time where we want to transmit a certain symbol and that's what actually every entry of A does contain for every symbol we want to transmit in such a GFDM frame. And I mentioned OVDM earlier now in contrast to OVDM we do not use rectangular filters in time anymore. If you're familiar with OVDM basically you have a rectangular filter over time and that translates into a sync and frequency and that's good but we want to do better and tackle the problems that involve that. The last thing we do is we can still use a cyclic prefix but we do only use one per frame instead of for every OVDM symbol in OVDM. So we might be more efficient there. Why do we want to have a cyclic prefix? Well this all translates to a circular frame property and that again in terms translates to more simple equalization technology. So we have simpler equalization if we use a cyclic prefix hopefully so we want to have that and it's great that we can have that with GFDM too. Okay so there's one challenge. I told you A is just a matrix and this matrix might be quite large so this multiplication is very inefficient and we want to improve on that. Just to give you a quick overview we want to modulate that again but now we want to do that in the frequency domain. We choose our subcarrier filters such that they are shaped like this. So we only have non-zero elements for our subcarrier in some area around it and now we can implement that with a fast free transformation. Okay that also involves a few more things. So the first thing is we will always have interference with neighboring subcarriers. You can see they overlap but we can manage this interference because with our filter design we are now in charge of this interference and we can actually later on the receiver we can cancel that out. And this design then translates to a more robust implementation if you think of imperfections like frequency offsets and timing offsets. So the system where we actually have a more localized filter for our subcarriers will be more robust against this kind of imperfections. Okay so let's move on to our low latency SDR implementation. I didn't need to have to start from scratch. When I started to look at GFDM I looked around if I could find any previous works and I did. Thanks André. He already started GRGFDM and well I took that up and implemented all the things I needed and worked on code optimization. I added quite a few tests so we could improve on code here. And now the system has a few components so we can do modulation, we can do demodulation. We also have synchronization part and we can have this mepper and de-mepper okay a little bit on our software structure. If you work in academia you often have well look at single frames and you do quite a bit of outline processing for your simulations so you want to have an interface to use with Python to use it with NumPy and SciPy or some other unnamed software and also this is quite nice to play around and just get a feeling for how things work. Here we want to have an interface, a simple interface so we can just do that. Then we want to have our implementation which is very modular, it's optimized and again should have a simple just C++ interface where you just push around your pointers and then on the other hand you can just plug that into GNU radio so you can just plug that into your block structure just call the appropriate functions, pass around the pointers and just use all the benefits of GNU radio like it takes care that everything runs fast that it runs multi-threaded that you have your hard interfaces you don't have to worry about that I mean no one wants to rewrite any drivers or things like that so that's the general idea of how we split our software into different components and last part I want to actually show you some benchmarks of the different parts of our system first off for this benchmark setup I made a few assumptions so we would also always look at the perfect channel so there's no noise or well actually no channel wouldn't make sense for performance evaluations but for benchmarks it works fine then also for the kind of communication systems I'm looking at we always have small packets it's debatable if 1024 bit is still a small packet but that's up to all the people involved in every project then also we always look at small constellations like QPSK we won't go up to like 54 quem in order to understand the simulations a bit better we need to have to know about two per meters N block size of our DFDM frame and J it's the amount of interference cancellation iterations earlier I mentioned that we will have managed interference between different subcarriers well we allowed that in the first place so now we need to cancel it out in a smart way and then we want to identify which parts of our system are actually taking up most time and which parameters need to be chosen carefully such that the system does not well exhibit bad performance like the latency will go up quite tremendously or the throughput will just decrease up to the point where it's not usable anymore for us okay so let's first look at the transmit side we have three parts we have a mapper we have a modulator and we have a cyclic prefix addition and as you can see resource mapping and cyclic prefix addition just don't play a role here they are less than one or two microseconds for every frame so the interesting part is actually the modulator and you can see that for two per meter settings I choose 128 or 64 subcarriers here and then you just vary the number of time slots you have you might wonder why don't we just choose powers of two for our modulation well if you look at literature again you'll see that you can't actually use GFDM with integer powers in time and frequency at the same time so we end up with this kind of weird values here and that also brings us to the reason why this is not just linearly increasing but if we have a look at the FFTW implementation that does the Fourier transform for us if we have small prime factors for example here if you look at these values sorry this value that's the dot here we have small prime factors so our Fourier transform can be implemented quite efficiently and it's quite fast if we have a look at this value here so that's that dot here we have quite large prime factors 19 is already large here so our implementation will increase in terms of latency so we need to choose our power meters here very carefully okay so now we know that on the other side and the receiver side we first start with synchronization since we have a system where we expect quite high SNRs like 10 dBs or something we will always start with energy detection which is quite fast and then just search for the correct symbol start or frame start in a certain window and now here you can see that depending on the number of subcarriers our implementation increases in terms of latency here so one frame we want to just synchronize find the correct position of our frame and depending on the number of subcarriers this might take like 18 microseconds or over 40 okay next up the most important part demodulation here you can see that again we have this behavior due to prime factors here in our Fourier transform so we should again choose our values here wisely but also we can see if we need to do more interference cancellation of course that will add up to our latency as well but actually not too much it's not like we can't afford a few iterations of interference cancellation without going over our latency budget so that's already almost it I want to come to my conclusion here I have a look at the overall latency budget of a system that we designed for two different numbers of subcarriers and here I just brought a little example so if you think back we have 64 subcarriers we have 21 time slots so that translates to 1344 complex symbols we could possibly squeeze into one GFDM frame and if you now think of it we run at 20 mega-samples per second so we occupy like 20 megahertz of bandwidth we have an airtime of roughly 74 microseconds we then add another one-way processing delay of 192 microseconds so if we just go from transmitter to receiver latency will be roughly around 166 microseconds and then in the end if we think back of the circle in the beginning our round trip time will be somewhere in that range so 330 microseconds so that's what we need to expect from such a system and that in terms just proves the point that we can actually do a load latency SDR implementation with the radio and thanks for your attention just maybe one time for one question or so, yeah so do we also improve reliability that was the question in this work we only looked at latency but if you think back of how we talked about imperfections of our channel for example if we have offsets that we might not be able to compensate for then GFDM would also show quite favorable figures in terms of reliability yeah, that's a good question so okay keep it quick why would I want to have an SDR and an industry 4.0 application well we want to have it as flexible as possible so we can just play around with our parameters and everything and the thing that doing everything in software is just the easier way to go it's just more flexible, it's faster to implement and everything if you start to implement it on FPGAs you have to you open up just so many more questions that you have to answer besides all the other good rhythms so that's the reason for it