 Alright, we'll resume. We've got a mini-conflict that he himself is actually helping to run so he's taking it out to give us an iconic tool. Thank you. Hi. Yeah, my name is David Rowe. Just rushed across from the open radio mini-conflict. We're building radios. Why aren't you all there? So, today I'd like to talk about some recent work I've been doing with Kodak 2, an open source speech Kodak I've been working on for a couple of years. I'll talk very briefly, introduce the Kodak and the application for the Kodak I've been working on which is digital voice over HF radio and why that's special. The mission I'm trying to achieve with this work, the goals, why it's pretty hard to get digital voice or voice at all over HF radio channels, and then compare how existing legacy analog radios do it and see what lessons we can learn from that. And then finally a demo. Okay, so Kodak 2, for those of you who aren't familiar with it, is an open source low bit rate Kodak. It fills the gap in Kodak's beneath 5,000 bits per second in the open source landscape. Until recently it was running between 1,200 and 3,200 bits per second. And recently I've been playing with some lower bit rates in particular for HF radio channels. It's designed for speech only. This is not the sort of thing you use for MP3s. It's communications quality speech, which means noticeable distortion in the coded speech but quite intelligible and you can usually recognise who's on the other end or who's doing the talking. So the main application is digital radio. I'm particularly interested in HF radio, otherwise known as shortwave radio. Also other possibilities of VHF hand held push to talk radio. Digital radio is important in that it can operate in particular HF radio without infrastructure. So without cell phone towers, networks, electricity, you can send voice information over thousands of kilometres. Very useful if you're in the developing world or if all the lights go out during a disaster. And several other applications as well. So this is a block diagram of what goes on in a digital voice radio system. We take a microphone signal, like the one I'm speaking to today. We sample it with an analogue digital converter. Then we use the codec to compress it to a low bit rate. In this case in the range of 1200 to 3,200 bits per second and beneath. Usually then we add some forward error correction. The idea here is we add some redundancy to those coded bits such that if there is an error in the signal we can correct at the other end. And in these sorts of channels, they're not like the internet, there usually is plenty of errors. You can be dealing with error rates up to 10% of your bits may get through in error. After forward error correction it goes into a modulator. We convert the bits into modem tones that can be sent over an analogue channel. In this case a HF or VHF radio channel. The demodulator is on the other end once it's come off the air. And that's got the job of picking the modem tones out of the noise and trying to reconstruct a sequence of bits that we then error correct using the FEC decoder, the forward error correction decoder and then pass on to the speech decoder which reconstructs the analogue speech via the D2A converter and we hear what went in the other end. So they're the blocks that we're playing with for a digital voice over HF radio system. Okay, as I said one of the great things about HF radio is no infrastructure and over thousands of kilometres. So that's got some particularly useful applications for humanitarian work, the developing world, people who are travelling long distance. My mission is to make digital voice work better than analogue. HF radio is one of the few areas of electronic communications where the analogue, incumbent analogue communication methods such as single side band work better than digital. Digital has had inroads into just about everything else. Our mobile phones are digital, digital TV broadcast radio has gone digital but what's left is HF radio and that's because it's really hard and the existing analogue methods actually work really well given the channel. I'm looking at the area of what's called negative signal, channel signal to noise ratios. The signal to noise ratio is a measure of how strong your signal is compared to the noise that the receiver is getting. So we're looking into the negative area of channel signal to noise ratios now which is really hard to do but open source makes it easier. One of the reasons open source makes it easier is we now have control over various layers in the communication stack. Until recently you had to go and buy a close source codec from this guy, a modem from this guy and then maybe a protocol from another guy and try to just hook these black boxes together. It was all closed, you had to take what they gave you and no choices or ability to experiment. But now with open source we've got control, the modems open source, the codecs open source, we can make our own protocols. We're not stuck with what other people say we have to use or what a vendor forces us to use. So we actually have some advantages over the incumbent systems and that we control all levels of the stack. Okay, this is called a spectrogram. It's like a 3D graph. Along the bottom is time, so that's 10 seconds. It's actually a speech signal, a guy talking over HF radio, an analog speech signal going over HF radio. Up the left-hand side is frequency, 0 to 4,000 Hz. For this sort of application, we're only interested in communications quality bandwidth, so typically 2,500 Hz of speech information will be sent over the channel. Now, this is analog information. What you can see is some of my mouse is working here. No, is there a pointer? Oh, no, there we go. Okay, so what you can see is, as this person's talking, you can see areas of light. The lighter it is, the higher the energy. So as we talk, we put a bit of energy in this part of speech. Then there's silence in between syllables. Then there's some energy down here in a low frequency area around 500 Hz. A little bit later, we put some energy in the mid-range here. So we just do this automatically as we're talking. The energy from our speech is distributed over frequency and over time as we speak. In between a sentence there's silence. These three bars are actually clipping. We overloaded the transmitter and you get broadband noise, so they're sort of spurious but interesting. And here's some other parts of speech here, once again separated by silence. And even inside the speech signal, at a particular point in time, you can see these gaps. We're only putting speech energy into certain parts of the waveform. So what we're doing, and what evolution has basically done with our voice is we're concentrating the energy we have from our lungs going through our larynx and our vocal tracts into certain bands of frequency and time. And that gives it a really good punching power in a noisy environment. So it's sort of an automatic allocation of energy across frequency and time to best transmit the message. Now, this might have been a vole for yelling in a noisy cave or screaming across a valley, but it also works kind of well on HF radio. If you've got to give an amount of transmit power, automatically pumps energy into the regions that are most important for us perceiving the speech. This, on the other hand, is a digital modem signal going over a radio channel. So what we have here is once again time. That's 10 seconds of modem tones. I'll play you in a moment what it sounds like, but imagine a fax machine or an analog modem. And over here we have a bunch of modem signals. They're actually in what's called parallel tones. So there's one here, another one here, you can see the bright bands. So we have multiple modem tones going across this channel. And the channel's wiping out bits of it. That's the nature of HF radio. It tends to blot out bits of it for a little while, then it comes back, gets stronger, gets weaker. It's due to what's called multi-path fading. And actually the same, exactly the same thing happens with Wi-Fi, but it happens at 2.4 gigahertz rather than the high-frequency radio bands which are in the 10 to 15 or 10, 3 to 30 megahertz band. But you get this multi-path fading and things get dropping out. Now the difference between this and the analog speech is that the modem power is continuous. We're allocating the same amount of power all the time to the speech signal, whether it's silence or not. So it's kind of a less efficient way of allocating power. And something we have to overcome for digital speech systems. So just comparing analog and digital voice. For a start, analog has a lot of redundancy and part of my way, say the high-frequency of what I'm saying get wiped out for a second or so by the channel is still pretty much going to work out what I'm saying. If you miss one word but get the next one, you're going to work out what I'm saying. So there's a fair bit of redundancy in human speech. Start off with digital on the other hand. We have some global bits. If you wipe out one bit in the codec, then it might sound like rubbish. The whole frequency spectrum might sound bad for a little while. So it's perhaps a little less redundant. In a digital system, different bits have different importance, a little bit like analog. Some bits, if they have an error, you won't be able to tell what other bits will make a really big difference. There's also typically in digital systems there's memory or error propagation. If there's a bit error, say at second one, at second two, you might still be hearing the effect because the effect of that bit error will propagate forward in time due to memory in the system. In an analog system, what happens is you get this gradual decreasing quality with decreasing signal-to-noise ratio. So as the signal gets weaker at the receiver, you start having trouble hearing it, it gets noisier, but if you concentrate you can still hear it and gradually the intelligibility drops down into the noise and you lose it entirely. Digital systems tend to fall over. It'll sound really good once you get to a certain receive signal-to-noise ratio. You'll get too many errors, bit errors, and the whole thing will just fall over. So it's less of a gradual decline or more of a knee in performance. As we indicated before, analog applies power fairly intelligently. We've got so many watts to distribute and we tend to punch them in where they have the most effect for human perception. Whereas digital just applies the same sort of power all the time, so it's a bit more wasteful. Okay. So my approach was to look at why the analog speech does work so well when using analog speech over these bad HF channels. And one thing that some people tend to do using HF radio is they start yelling into the radio, they repeat themselves, and they use things like the phonetic alphabet. So down the bottom here is an example if you're trying to send these letters that's my call sign. You'll go Victor, Kilo, 5, Delta, Golf, Romeo rather than VK5DGR. What I'm doing there is spreading out that information in time. So I'm allocating more energy to each letter if you like, slowing down the bit rate of what I'm saying. Getting some messages through is better than none. So it's a little bit like variable bit rate coding. But in an analog sense, I'm slowing down the information rate. And at the other end, we have this human forward error correction. If you know who I am and you've heard these first two letters you can probably guess what the rest is going to be if you've been talking to me for 10 minutes. If you hear the start of one word and it's the English language, you're pretty sure what the end's going to be. So we have this human forward error correction so one of the reasons that analog SSB hangs on at these low signal to noise ratios is that we can, as well as being fairly robust into the noise, we can also adapt the coding rate if you like. Okay, so the digital equivalent to that is lowering the codec bit rate. So I've said to myself SSB or the analog methods sound pretty bad at low signal to noise ratios. Let's make a codec that sounds really bad but has a much lower bit rate and more chance of punching that message through. So I've tried to lower the speech quality right to the edge of intelligibility. Intelligibility is where I can't understand what's being said anymore or I might have to start repeating myself or using the phonetic alphabet but I'll still get that message through. So it's okay to repeat ourselves, it's okay if the speech quality is low but the benefit of a lower bit rate is it means you get more power in each bit punching through so better chance of getting that signal decoded at the other end. Let's demonstrate what this actually sounds like when we play it. These are some real off-air recordings that we did recently. Okay, first thing I would like to show is the effect of first of all the different coding rates. I'll play the original speech before coding then I'll play it at 1300 bits per second and then at 450 bits per second. So first the original W5ABC here is Victor Echo 9 Quebec Romeo Papa My name is Bruce, Bravo Romeo Uniform Charlie Echo Okay, now at 1300 bits per second W5ABC here is Victor Echo 9 Quebec Romeo Papa My name is Bruce Bravo Romeo Uniform Charlie Echo And 450 W5ABC here is the Quebec Romeo Papa My name is Bruce Bravo Romeo Uniform Charlie Echo Okay, so now we'll have a listen to some of the off-air signals. So these are the modem signals what we're actually taking off the air off a radio and then trying to decode. Here's the modem signal these are at fairly low 5 dB. There's a modem signal somewhere funny Donald Duck stuff at the top but it's pretty much noise to the human ear but still the modem can pick out a reasonable signal out of that. The next one is what analog speech sounds like over that same channel. I've listened to that quite a few times and all I can really get is the phonetic alphabet stuff at the beginning the rest is sort of lost in the noise. Now this is what the digital signal sounds like for that same channel once we take those modem signals that we couldn't quite hear decode them here. Okay, so that's an example of the digital cliff it was going perfectly and then lost it and then came back again but while it was working it was noise free and nailed the analog sample in terms of quality. So I'll show you what's actually gone on there. So this is the software we use to listen to these signals off air but that's the spectrum of the signal coming in so that's a graph of amplitude off along the left and frequency on the right you can see the signal is not there essentially it's beneath the noise floor down the bottom is another time plot this is over 30 seconds versus frequency down the bottom and the intensity of the color is how strong the signal is. You see it's being wiped out completely in this area that was that bit where we lost it entirely. So the modem signal really did disappear under the noise which is why we couldn't we couldn't hear it and then suddenly it comes a bit stronger we hear it for a while. Here is an area where it was pretty bad but we still managed to decode it okay. So that's a time sort of domain plot of what we're listening to and that's actually the modem software I've got listening to these signals off air can estimate and plot the signal to noise ratio so when we could hear the signal it was up here then we lost it completely and then it came back again that's sort of typical of your HF channels so that was why we couldn't hear it in the middle. Okay that's the end of the talk, happy to take questions. Any questions, no questions? Well one question or two Does your like the codec and the forward error correction or both or between them take account of the characteristic of the multi-path dropout by like spreading the bit over time? Yeah the FEC code spreads things out over several hundred milliseconds at this point. Unfortunately with HF fading it can be over seconds and the tradeoff with that is latency you can't delay it too many seconds or the guy at the other end will notice the you know have to push the button wait three seconds to hear something for example. There's a little bit of that's called interleaving, there's a little bit of time to make interleaving there. I guess you're targeting lower bit rates and what already exists because there's only got a few speech codecs like AMR which is kind of proprietary aspects and Opus which are open. Yes, they're all sort of in the above 5,000 kilobits per second range here which is a little bit too fast for digital radio because the bandwidth you take up would be too much and you get poorer bit error rates, consequently. Really, do you ever get lower than 400? Yeah. There's always a way you can go a bit further down. And it's a tradeoff with things like delay as well. You mentioned that normally codec 2 only goes down to 1200 and you've managed it. I just had a quick to our hack session and stripped off some extra information. What I intend to do in the release version of this is have some site information that will have the missing stuff good signal you then pick up the additional 800 bits per second and you get the 1300 bits per second top quality out of the codec. So it sounds good when the channel is good but you still get your message through when the channel is poor. In terms of pricing codec and decoder how much do you need? It doesn't take much in terms of compute power to handle the codec. No, not very much. The smallest chip I've got it running on is a little microcontroller that's using about 1,050% of the CPU of a 160MHz microcontroller and that's all C. It doesn't have an optimized or anything. Is that like a Cortex mic? Cortex 4. Just before to be able to hand off from the 450 to eventually the 800 bit bit rate is that a common practice to do variable bit rate sort of things or is that a new approach? I don't know of a HF. A lot of that happens to be, as I said, because it's black boxes you use this guy's codec and it tends to stick at the same time. It's certainly common over high bit rate codecs but I'm not quite sure about the HF because there's not much other open stuff going on down there. So the thing that I didn't get from the talk that I was kind of expecting from the beginning was do you take into consideration in your codec encoding the way that the modem is going to convert it and encode it over HF? Yes, several times I've gone back and iterated the codec to suit the channel and the modem. Okay, because it looked like it was still pretty uniform on the plots and I was expecting it to be more kind of broken up a bit. Yeah, there is another version that I've been varying the analog transmit power of the modem based on the input speech power of the speech signal. So if you look at like a scope plot it looks just like a variable power thing. Fully with various different schemes at the moment. Alright, so if you've done one more question. Just to guess the HF radio and the communications it's still real time as in five seconds of speech takes five seconds to finish. Is sort of like a non-real time communication something that you'd expect might be worth looking into? Possibly. There's applications for messaging and things like that. Alright. So if you join me in thanking David for coming over and giving us that talk.