 Thank you for the intro, so since you're all here, I presume you're all interested in audio at some level, but I have no idea what kind of background everybody has here. So before I start, I just want to see, okay, first off, how many of you are musicians? Do any of you play musical instruments? Okay, a good number of you. How many of you, can you all hear me? Without a microphone? Okay, I won't use a microphone then. How many of you have opened up an audio editor and looked at a waveform? Yeah, okay. Good. How many of you have written code to generate audio or process audio, like at the sample level? Okay, just a few. All right, so what I'm going to be talking about, well, actually, I'm going to talk about some audio DSP basics, really, really basic stuff. So a few of you who've done some audio processing, you might find it a little bit boring, but some of the rest of you hopefully will find it interesting, and I'm going to provide you all with the source code that I'm going to show here. It's mostly going to be code examples that I'll show, and all in C++. So a bit about my background, very briefly, I have an electrical engineering degree from Queens University of Canada, master's in electroacoustic music from Dartmouth. Most of my recent work has been in iOS, so I have a few apps that I've shipped for various companies. And before that, I've done speech recognition for Apple video editing software for MUV. I used to do hardware design for Nortel networks, so I've done a whole bunch of stuff. I'm probably older than I look, at least I hope I'm... So I'm going to show you a bunch of examples of just C++ code, all in standard C++, so you'll be able to run this on whatever platform you want, basically synthesizing a basic tone programmatically, reading and writing a WAV file, mixing multiple audio files together, changing the play speed of an audio file, some very basic filtering, and also how to apply pan and balance if you've got a stereo file. It's all really, really simple stuff to do in any audio editing app, but here I'm going to do it at the low level. There's a link down there for downloading the code if you want. It's probably not there yet, because I'm uploading via my phone right now. If it doesn't work, I have it all in the thumb drive here. Okay, so without further ado, we'll go straight into some code. Let's see, okay, so I have basically a very simple console application with a bunch of examples down here at the bottom for doing the various examples. And I start with just creating a tone. So if you've ever looked at a WAV form, in a WAV form, if you zoom in enough, you'll see that you have a WAV form that goes up and down. So I'm going to synthesize a WAV form here, so just briefly code overview here. I create a buffer for my output and with a certain number of sample frames, I'll just run a loop and in the loop, I generate a sine wave at a certain frequency. Sample rate is 44 kilohertz in this case. And so I just fill up the buffer with the sine tone and I'll write it into a WAV file. So I'll just run that. So it's run it and let me just go back to the finder and there's my tone. There's no, it works. I wish there were a volume control on here. Okay, now if you're any of you who are musicians, you probably know that instrument tones, they're not just sine waves, typically you have overtones and the relative levels of the overtones that is overtones or harmonics, they change the sound quality. So now I'm just going to generate another tone here, but I'm going to add in some harmonics. So I'm going to add the second, third, fourth and fifth harmonics at different amplitudes here and I run this and I'll get another tone which will probably sound about the same because the speakers are distorting so much. But we'll see, I'll turn down the volume this time, okay. A slightly different sound, it's more, it's a slightly richer sound because it's got some harmonics in it. Now if you, obviously those aren't very interesting sounds, but if you actually do real synthesis of interesting sounds, if you get down to the very, very low level, ultimately you're just, often this is one way it's done, just adding up harmonics to generate interesting sounds and then applying envelopes, time envelopes to get a smooth attack and a smooth decay on the notes. So yeah, so that just demonstrates generating some audio programmatically and writing it to a wave file. Next, I'll just, okay, so I'm not going to do any more audio synthesis. The rest of the examples are just going to be processing audio in various ways. First example is just really trivial. This is just showing how you can read in an audio file and write it to another audio file. So I'm just going to, yeah, you probably recognize, yeah, I should have made it more of a surprise, anyway. This one I must turn down, otherwise it'll blast your ears out. Okay, so, okay. He's back in the charts, yeah, he's got a number one hit in the UK right now, amazing. But 20 or 30 years after he first hit the top of the charts, good for him. So next I'll just show mixing, mixing audio files. Again, apologies to anyone for whom this is just so obvious and basic, but if you've never done audio programming before, some of this stuff is not necessarily obvious. So this next example is just mixing audio files. So here I read in four different audio files. It's funny, my mouse doesn't work properly here. So here I read in four different audio files, each into its own buffer. And then in a loop here, I add in sample by sample contribution from each of the four different parts. So I have a bass part guitar, keyboards, and drum kit. Write it to an output buffer, and then finally write it to an output wave file. And it'll work better if I run it. So I'll just run that until it's done, and show you what the source files sound like, just a couple of them. There's the bass track, drum kit, and the mix that was generated, is this one. So that's basically all there is to audio mixing. In the most basic sense, audio mixing is just adding up the signals. And if you want to mix things with different gains on them, that's just a multiplication. So if I want to have the bass be softer, and the drums be louder, I just multiply the samples, multiply the sample values by some gain. So now I've reduced the gain on the bass, and I've increased it on the drum kit. So now we'll hear louder drums and softer bass. So pretty straightforward. And again, you know, after the presentation, you can download all this, or get a copy of all this code so you can play around with this stuff yourself. Okay, next one is a little bit more fun. I'm going to change the playback speed of an audio file. This is a very, very simple change, you know, playback speed change. Basically if you have sampled audio and you want to play it back more slowly, you just have to read through the samples more slowly. But since you only have samples at specific points, in order to make it sound good, what you need to do is interpolate between the sample points. So suppose I'm advancing by, I don't know, two thirds of a sample, you know, from this point to this point, but I want the value two thirds of the way through, I just kind of draw a straight line between the two actual values, linearly interpolate to get a new sample value. And that's the most common way of doing the interpolation. There are better ways of interpolating, but they're substantially more complicated. So I'm going to do that Rick Astley file again. But I'm going to make him sound a little lower pitched. So I'll play through at, say, 75% of normal speed. And in the loop here where I'm generating the output samples, I have a source position. So this is where I want to grab a sample from the original samples. I take that position and turn it into an integer part and a fractional part. So that's figuring out if I'm at sample, you know, a sample here, partway between two samples. And this sample value and this sample value, and I want to know how far I am between the two of them. So the integer value is the index of one sample, the fractional part is what percentage of the way I am between those two samples. So that's what's happening in these two lines where I get int pause and frack. Then I get the samples from the source buffer before the previous sample and the next sample. And I do linear interpolation between those two samples and increment the position according to how fast you're playing through it. So I'll just run this, we're done, we have an output file. So this is Rick Astley played more slowly. Notice that it's not just slower. It's also lower pitched. So his already gravelly voice becomes even more gravelly. And that's what happens if you change playback speed in this very basic way. You've probably, many of you have probably seen applications that allow you to slow down audio without changing the pitch. I have one myself. I've written one called Audio Stretch for the iPhone. It's possibly the world's best. I'm not going to demonstrate how to do that because that's actually very, very complicated. Requires some very advanced signal processing, including Fourier transforms. You need to take stuff from what's called the time domain into the frequency domain, play around with phases, then go back to the time domain. It's very, very involved. A lot of black magic involved in it. But just changing the speed like this, it's actually kind of fun. It's equivalent to changing the speed on tape player, for example, or changing the playback speed on a turntable. Do any of you know, have any of you seen those big old, you know those big black things? Vinyl, yeah, that's it. I call them bulky disks, as opposed to the compact ones. So if you change the speed on a turntable, that's equivalent to doing this very simple speed changing. Okay, next. Okay, so changing speed. First I'll just show applying some basic filtering, again using Rick Astley. It's good to be consistent because then you get used to what the sound is, right? Okay, so first I'm going to do a low pass filter. So, okay, I have a class here called a by quad. A by quad filter is a certain kind of filter topology. You can read all about it in the comments. There's a fair bit of funky math involved. But the way I've packaged it up, it's really pretty easy to use. He just, yep. Can I just make it in a presentation more? Okay. Otherwise, experts. All right. Okay, great. Nice. Now I can almost read it without my glasses. No. I've been programming for too many years. Okay, so by quad filter, I just initialize it, specify the sample rate. And here I initialize it as a low pass filter. And I'll have a cutoff frequency of 400 Hertz. Now what a low pass filter does is it passes any frequencies that are below the cutoff. So it means in this case, anything below 400 Hertz gets through. Anything above 400 Hertz is gone, more or less. And what that will give you is a really muffled sound. So, okay. So here's Rick Astley filtered. So a muffled version of Rick Astley. And conversely, if I do a high pass filter, that means you'll only have, in this case, only have frequencies above 1000 Hertz. So you basically lose all the bass. And it'll sound really, really tinny. Sort of like what you get from your iPhone speaker. Now some of you might notice, if you're musically inclined, that even though there are no frequencies above 1000 Hertz, you're still hearing notes that are below 1000 Hertz. The reason for that is that the notes as instrument tones, as I was explaining in the very first example, they have multiple harmonics. And even if the fundamental frequency, the actual component at the note itself is missing, your ear can still figure out the notes based on the harmonics. One of the magical things that the ear does. Is this component at this frequency completely missing or is it just decreased? It's, in this case, well, here the cutoff frequency is at 1 kHz and it's rolling off. It's a bit like minus 6 dB per octave? It's a byte called minus 6 dB per octave. So by the time you're down at middle C, which is 260 Hz, you're down by 12 dB, which is quite a lot. Even if you had a steeper cutoff, you'd still be able to hear the notes. You could have a piano playing in middle C. You could eliminate everything below, say, 500 Hz and you'd still hear it perfectly clearly as in middle C. And in fact, with telephones, telephones pass almost nothing above 300 Hz or so, 400 Hz. But the pitch of the human voice, male voice, averages around 100 Hz. So the reason you can still hear it and hear it at the correct pitch is because of all of the harmonics. And in fact, with the voice, all the different vowel sounds, they're just different harmonics. That kind of mouth movement of just shaping the harmonics. That was kind of embarrassing. Is it one of these filters that's used to get rid of background noise? You can use that. A simple filter like this is good for eliminating some kinds of background noise. Like if you had, say, a hum at 60 Hz or 50 Hz or something like that, and you want to get rid of that. That's what they do for cancelling server noise. That's more or less what they do. Yeah, but it's not great. Most noise tends to be across many frequencies. So there are some much, much more sophisticated ways of eliminating noise. On a basic mixer, this is more or less the kind of filter you would have. Basically high pass, low pass, band pass filters. This BiQuad class, you can initialize it to do many different kinds of filters. I've just shown high pass and low pass here. But there's also band pass and notch filters. Is that how the lossy compression works? Like the cut up and the lack of brain re-construction? No, lossy compression is quite a different thing. With lossy compression, what they do is they basically analyze the sound in the frames. Short intervals, maybe 20 milliseconds, 50 milliseconds, something like that. Transform them into the frequency domain. And then figure out what bits of the frequencies, which frequencies you actually won't be able to hear because they're masked by other frequencies. Masking is like if you have one very loud tone, a particular frequency, like a very, very loud tone at 440 Hz. And then there's another one that's very, quite soft, say at 500 Hz. You probably won't be able to hear it at all. So you can throw away that information and that's how you get compression. There are other psychoacoustic properties that are used. But the main idea is just masking the fact that there are sounds that you just don't hear, so you can throw them out. Okay, next I'll just do one final example here. Again, really trivial for people who've done this kind of thing before, but if you've never seen it, it might be kind of interesting. I'm going to take a stereo audio file this time. Everything I've done so far has been in mono. This time around I'm going to use a stereo audio file. So when I read it in, I actually get two channels. Num channels is two here. And I'm going to read through the audio frames and I actually have two buffers in this case. One for the left channel, one for the right channel. And I'm going to apply balance function to it. And what this does is it changes the gain on the two channels. So if you pan something say fully to the left, what that means is keep the left channel at full volume, but completely cut out the right channel. Similarly, if you pan it fully to the right, you keep the right channel and turn off everything on the left. So with the balance at zero, that means in the center in this case, we'll get the original sound. Volume and stereo here. This is one reason why I did everything in mono tonight, because I thought this might happen. But you can take my word for it. In Stairway to Heaven at that point, the guitar is fully on the left. It's fully on the left and the recorders are fully on the right. So if I just adjust the balance so that it's more to the right, I won't do it fully to the right, but most of the way to the right, I'll have pretty much all recorder and very little guitar. I think it was the other way around. So now let me try to switch it fully the other way. I'll go full left. It could be bugging my code too. I wrote this about an hour ago. That's because in the mono. Even in the mono, you get a challenge. You didn't hear the 0.8? Yeah. Okay. That's all guitar. You can just hear the amplitude. Okay. Let me do it fully the other way. Okay, so let's do that. Let's upload this to your right. If you're trying to transcribe stuff, it's sometimes very useful to be able to just set the balance on things. Okay, so now we have full right, which according to our Led Zepp aficionado here, it should be all recorders. Yeah. It's just very soft. I'm usually soft. I think the audio is actually only picking it. Almost only picking up one channel here. Okay, so there you go. That's pretty much it. Let me go back to the presentation here. Let me bring up slideshow. So there you go. There's the download link for all the code in these examples. Let me just check whether that's actually there though. Never mind. If you want the code, just ask me and I can pass you the thumb drive. Any questions? All right. If there's anything you want to ask me, feel free to ask me afterwards. There's a lot of stuff I know beyond this. I just kept it very, very basic. Do you have an interesting example of on an iOS app something you've had to do to manipulate sound? I mean, I know there's this around stretch app, but usually I always think it's like playing this clip here, playing this clip there, but have you actually seen people changing the sound on the fly? Well, I mean, that's what my audio stretch app does, a lot of that. Also, if you look at the Band Lab, Band Lab, it's another Singapore company. I wrote the audio engine for that app, and that allows you to record multiple tracks and play them, edit them in various ways. And for them, I'm also writing a synth, a sample-based synth, which incorporates, you know, pitch shifting and envelopes, all kinds of stuff in there. Doing reverb, and writing reverb for them. You have an app on the store? Yes. Audio stretch? Yes, audio stretch. I have several different apps. So, audio stretch, Smart Scales. Smart Scales is an app for, if you're studying for ABRSN music exams, and you want to practice scales and arpeggios, and you want some interesting background music, that's what Smart Scales does. Band Lab is a cloud-based audio production tool. Collaborative cloud-based audio production. So you can, say, record a guitar part and then share it with people, and then they could add a drum track, add a vocal track. It's a very cool synth-based company. This app I released a couple of days ago. It's another company I'm working for. These are all for different companies, actually. This one is a vocal transformation app, so you can record selfie videos, and then change the voice. So if you want to make yourself sound like a giant, or, you know, you feel like changing your voice into a woman's voice for some reason, you can do it. Is that available on Google? No. Apolloni. That's Apolloni. Master Race. Okay, sorry. A little bit of an off-topic question, but in your perspective, what kind of education should an audio engineer pursue? Well, it depends what you mean by an audio engineer. The term can mean different things. Do you mean like an audio engineer as in someone who records and mixes records? Then my guess is that you know, you should take one of these music tech programs in junior college and get as much studio experience as you can. As far as I can tell, people who do well in that business are mostly learning on the job. If you want to do audio signal processing, you know, a low-level code and an engineering degree, they have lots and lots of math. That's what we need to do. But they're very, very different disciplines. People who do audio engineering in studios, they generally have no idea how the tools really work at the mathematical level. And they should. They don't need to. I mean, with the virtual reality becoming more mainstream, how's that impacting audio or they're using a lot of these devices and like the Google Cardboard type stuff? Yeah, it's funny you should ask that. I was recently contacted by Microsoft. They were interested in it. They're doing their HoloLens stuff. And now audio is becoming a part of that. So you want to do 3G audio? Because if people are looking at a scene in three dimensions and they're audio sources, the audio sources, if the things in the world are moving around, of course the audio is going to move around as well. People have been doing that for many, many years. Spatial audio is not anything new. It's been used in games for decades. Now it can be done much, much better because there's a lot of process in the world. Fast enough? Sorry? If you're thinking fast enough, if you really want special audio experience, this is technically easy to do, or possible, or not at all. If you're just relying on your audio and you're not sure about navigating it. Yeah, the most straightforward way to do it is using something like head-related transfer functions, HRTFs. So basically you figure out how a sound coming from a certain direction gets transformed on its way to the two ears. So there's a time delay, there's a bit of filtering. These HRTFs for every possible position, every possible position, elevation, and direction around you. Then you can take any source and process it through the HRTFs and interpolate between them. So you basically make the sound sound like it's coming from anywhere. If you want to do it really well, you should also take into account the speed of sound as well so that a sound that's coming towards you is higher in pitch, and as it's moving away, it drops in pitch. Adopter flag, exactly. But with enough memory and CPU, you can do all those things. Go ahead. Now that you're working a lot in mobile, do you find yourself having to spend much effort compensating for hardware that sucks? This is why I work on ILS. Nice. I mean, of course, not really designed for a real family process. Android is getting much, much better. Much better, but historically it's been terrible. Windows has been pretty bad, but again, it's gotten much, much better. IOS is still, as far as I know, the best. And so if you want to do something where you have, where real-time response is important, like if you touch the screen and you hear something right away, IOS seems to be the only option right now. It's the only one that's good enough, say, for a professional musician. But the others are catching up fast. What's the reason for that? Can they elaborate why? It has to do... It's not really hardware. It's not a hardware issue. It's more an operating system issue. Basically, to do audio, you need to be able to... You need to have... You need to be able to change processes extremely fast. You need to be able to switch to your audio process extremely fast, then go back to your UI thread. Your UI code should never take priority over the audio thread. And in Apple's OS, both on Mac and on IOS, they managed to make that processing very fast and efficient. So you can use very, very short buffers when you're generating audio on the fly. Okay. I was wondering if there's any research that is... If you find that... It looks like everything is done. What is the real challenge now? We know how to do speech processing, speech recognition. Yeah, there are still things that are very hard to do. Just one example of a problem that's still pretty hard to do. There's automatic transcription. Like, you have a recording of someone playing the piano and you want to find all the notes. It's really easy to do badly. It's damn near impossible to do it perfectly. And once you get a really complicated mix of instruments, it can't be done yet. You know, it's sort of recognitionary to see any published literature. Even just for, say, solo piano, if they do 80%, it's fantastic, you know? But from the point of view of a musician, if you get 20% of the notes wrong, that's awful. So that's really hard source separation. That's another big one. I suppose you have several people speaking at the same time and you want to get isolated. That's really hard to do. So there's still plenty of open problems in the audience of the process. Anyway, you can ask me lots of questions later. We probably left a chair down. Thank you. Thank you.