 Thank you. All right. Hello, you are Python. I've got a lot of content, so I'm probably going to talk pretty fast Apologies for that As I was introduced my name is Peter so but I'm a staff machine learning engineer at Spotify And I work on a team called our audio intelligence lab Now that doesn't really mean too much But what we actually do is we teach computers to listen to music We use machine learning to build systems that can listen to music automatically and tell us interesting information about that music So some of our work is stuff that you might have seen on the internet or in open source We publish things like this API here Which lets you get the audio analysis of any song on Spotify So if you've ever listened to a piece of music and you wondered what key is that in or how fast that is or things like that This API can tell you that for any track on the Spotify platform, but we don't just create APIs We also open source things like this thing we open sourced a couple weeks ago called basic pitch Basic pitch is a machine learning model That'll take any piece of music and transcribe it into the notes that were used to play it So it kind of takes audio and turns it into digital sheet music known as MIDI And it's a fully open source machine learning model built in Python available on github there Or if you want to check out basic pitch spotify.com. It's available there in your browser as well What I'm here to talk about today is neither of these things rather a different project that we've built called pedal board Pedal board is an audio processing library in Python now pedal board lets you load audio manipulate audio edit audio Do all sorts of cool stuff with audio directly in Python in a very pythonic way, and that's what I'm going to be focusing on today So we're going to talk about about five different top-level topics today We'll talk first about what audio is because this is Euro Python not Euro audio or something like that We'll talk about how digital audio works and kind of give you a brief overview of you know How audio works in a computer? We'll talk about reading and editing audio in Python We'll go on to talk about problems with audio in Python and then finish off with audio effects with pedal board Which is really what it's good at and we'll have some really cool examples in there as well So let's start off with the absolute basics again. This is Euro Python I don't expect any of you to have any knowledge of what audio is. Hopefully, you know what Python is Let's answer this first question of what is audio? What do I mean when I say audio? Well, if you ask the dictionary, it'll tell you audio is sound especially when recorded transmitted or reproduced But that's still kind of vague. What is sound? How does sound work at a really basic level? Well sound is really just any vibration any sort of vibration through the air that isn't too slow or too fast For our ears to pick up our ears are sensitive to a range of different vibration speeds between about 10 vibrations per second All the way through to 20,000 vibrations per second, which sounds like a lot But it's not that much In this range, there's a whole bunch of different sounds that you've definitely heard before in fact everything you've heard is within this range Towards the lower end of the range You'll have sounds like bass guitars and rumbles and car engines and things like that and those have fewer vibrations per second In the middle of the range you'll have sounds like the human voice and kind of higher pitch sounds Maybe musical instruments a lot of stuff like that and towards the top of the range You have sounds like birds chirping and other kind of high pitched sounds. So higher pitches mean higher vibrations per second or technically higher frequencies These frequencies here are notated in the unit called Hertz Often noted it as HZ and they're called Hertz because higher frequencies hurts your ears No, it's actually named after this guy this German physicist Heinrich Rudolph Hertz But nonetheless, that's how sound kind of works at a basic level So if that's how sound works in the real world, how does digital audio work? How do we get that into a computer measure it and play with it in our code? Well, just like everything else in a computer We need to measure it in the real world to get it into a computer and start working with it So let's do some of that measurement here Let's measure this by putting a graph on the screen and of course we're gonna need to put some axes on this graph So we'll put time on the bottom and then something called amplitude on the y-axis here on the left-hand side The amplitude I'm not gonna get too much into what this means But you can think of this as the position of the sound wave or how if the sound wave is pushing towards you or pulling away from you That's the amplitude and we'll we'll see that on the screen in a second here So I'm gonna play a sound and I really hope this works And as you hear the sound you'll see some points show up on the screen And that'll be me clicking a button on my laptop and recording the value of the speaker or the microphone here at each point So let's try this It worked. Okay. So we've got a bunch of points on the screen here. There's not too many of them They're not very detailed, you might say But we've recorded the position of the microphone that was used to record this audio at various points in time I was pushing the button roughly once every 0.4 seconds or so And if we take the inverse of that or at the reciprocal of that we get a value in Hertz again That unit for frequency and that value of 2.5 Hertz here is what we call our sample rate This is how often we sample the microphone and actually figure out where the microphone is at that point in time Now there's other things we can look at here. You notice that this graph is a little bit odd It's got points at the top and the bottom and that's because silence is not at the bottom of the graph It's actually in the middle and the top and bottom of the graph here are very confusingly both Maximum loudness we have maximum loudness in the positive direction at the top and in the negative direction at the bottom And again, this is because sound is a wave sound is vibration and it goes back and forth So if it goes back and forth a very small amount It's going to stay towards the middle of the graph and if it goes back and forth a lot It's going to hit the extremes so maximum loudness again is the top and the bottom of the graph very strangely And then we can measure each point here So we've got this point in the left which might have 60% amplitude So we could represent that as 0.6 in code and we've got this point here Which might be 90% amplitude but in the opposite direction So we'll look at that as negative 0.90 so with this we kind of have enough points to try to reconstruct what that sound was And reconstruct the entire waveform, but unfortunately we really don't have enough detail here I didn't push the button often enough for me to get enough information to reproduce what we just heard and to play it back Essentially, so now instead of 2.5 Hertz Let's re-record this again at a much higher sample rate and you're going to see a lot more data on the screen here So let's get rid of that and try this again Okay, so there's more points there now. There's a lot more points. In fact, you can't even see the individual points We sampled at 44,100 points per second or 44,1000 Hertz and that's just a huge amount of data there So we can't do much with what we're seeing right now We're gonna have to zoom in to actually see what's on the inside here So let's do that. Let's zoom all the way in from the second scale down to the millisecond scale here and as we get closer and closer You can actually see the individual sound waves that were used or that came out of that saxophone We went into your ears and this is what your ears were vibrating like back and forth in this pattern Now here if we measure the distance between these two peaks not these points, but these two peaks here We can actually see there's a duration in there in milliseconds That's 5.7 milliseconds between peaks and if we do a little bit of math on that again Do one cycle divided by that we can find out even the frequency of the note that the saxophone was playing So just from measuring that that waveform over and over we're able to find out the note And I think that's actually an F so that we can find out what musical note was being played there just by doing some math But let's keep going even deeper. We're slowly looking at the milliseconds level Let's go from milliseconds down to microseconds where we'll be able to see the actual samples So now that we're super zoomed in the actual samples look like little points along this graph and we can measure again every single point at the microseconds level and This gives us a representation of the audio and here's the real secret that I want to tell to you Today, this is digital audio. This is all it is digital audio is really just streams of numbers Nothing too complicated. Although you can't do complicated things with it. This is how you represent digital audio So if this is how digital audio works, let's take a look at how we might store this in different ways Let's zoom out a little bit again here So what we're looking at here is called uncompressed floating point data It's 32 bit or 64 bit floats and this is pretty heavy. It's about 21 megabytes per minute of audio That's a lot of data if we had to store all that for every single song that you listen to that would be well really expensive It would take a long time to download and so on So people store audio in different ways Sometimes we can store audio as uncompressed fixed point data as 16 bit or 24 bit or 32 bit signed integers And on the lower end of that we get 10 megabytes per minute instead of 21 megabytes per minute But that's still a lot so back in the 90s people invented some very very clever Compression codecs these compressed floating point codecs like mp3 and aug and so on Compress all the way down to one megabyte per minute of audio, which is much much easier to deal with However, as you can see there, we're no longer actually looking at the numbers. We're not looking at these floating point values We're just looking at encoded bytes So to do anything with this compressed data We're gonna have to use a library or some other system to decode it and then use the numeric information We get after decoding here So that brings me smoothly to my next topic here reading and editing audio in Python So if that's how audio works in a computer and that's how we might store it on disk How do we get that into our Python code? Well, here's where I shill just a little bit for the library that I've been working on for quite a while called pedalboard And we're gonna start here by actually diving into the code We're gonna start by importing pedalboard by doing from pedalboard.io import audio file Now in Python if you want to open up a regular file you use the open function But here we're gonna use the audio file constructor or the audio file Let's call it a function and it's going to do the exact same thing So we can do with audio file my favorite song done at p3 as f and that'll open a context manager for us Open this file up and allow us to play with what's inside this file It'll automatically decompress the audio for us and just give us data that we can deal with So let's look at some properties here We can do f dot sample rate to give us the sampling rate that we talked about earlier We can do f dot num channels to tell us the number of channels that are in this audio file Oftentimes audio files have two channels one for the left ear and one for the right ear And then we can actually just read the data out of it We can do f dot read here and we can ask for three samples just the first three samples of the file Now pedalboards going to give us the first three samples in each channel So we'll get back an array of a shape or with a certain shape here The shape here is two three for two channels left and right and then three frames of audio here Now if you're not familiar with multi-dimensional arrays, they're not too complicated You can kind of think of them as arrays of arrays But in this case each sub array has the fixed shape so we can just use that shape parameter to see what shape the overall array has But okay, if that's how we can open a file, what can we do with the audio once we pull the data that file? Let's give that a try. Oops Just to illustrate here the left channel of the audio is the first array here And then the right channel is the second array so we can actually separate these out very easily like that But going back to the start here Let's take a look at opening up another audio file and instead of just reading in three frames. Let's read in all of them Let's do audio equals f dot read f dot frames Now f dot frames here is just a property that tells us how many frames we have in the entire file And here we get back an array with a certain shape again two channels So that's left and right as the first dimension and then 1.3 million samples as our second dimension Channels first and then samples afterwards So from there we can then look on to actually decompose this audio a bit We can do audio at zero to give us the left channel audio one to give us the right channel And you can see there are arrays with totally different contents here So they don't have to be correlated with each other and then using pythons array slicing syntax We can start to chop up this audio and edit it trim it as you would so we can do audio and then pass in an empty Slice for the first dimension and then I slice up to the first 100 samples for the second dimension And that'll give us a stereo chunk of audio or a stereo array that contains only the first 100 samples But 100 samples doesn't really mean too much for us. It just tells us that's 100 points What does that mean in terms of seconds or in terms of duration? Well to get that we can do some math using f dot sample rate here So again the sampling rate we talked about before we can ask for the first 10 seconds by using just regular python Array syntax and doing f dot sample rate times 10 We can also ask for the last 10 seconds with the exact same array slicing syntax negative f dot sample rate times 10 there Okay, so now we know how to read audio and kind of play around with it as an array and slice it up What can we do with it though? We've really just opened up an audio file How can we make it maybe sound different or you know look different something like that? So let's go all the way back up to the top and let's read in our mp3 file again and this time Let's just whittle it down to one channel just get the mono audio or one channel from the left channel here Now let's try to change how this sounds instead of just playing with a regular audio file Let's add some delay to this let's add some echo This is the same effect that you might have if you're in a big space And you might hear a slap back echo of your voice coming back from the back of a room Which is actually what I'm here right now But let's add that to this audio file with some code and let's see how few lines of code we can write in python to make this happen So let's set some parameters first Let's set delay seconds equal to zero point two or one-fifth of a second So let's make our audio delayed by zero point two seconds And then let's convert that into samples so that we can deal with this as like array offsets rather than floating point numbers here And then we also should set another parameter here, which is how loud our echo or how louder delay is going to be So let's set that to volume equals 0.75, which is seventy five percent of the original volume And now we're going to do some math. Don't get scared. It's actually fairly simple, but we're going to do a little bit of math here So we'll iterate through the original array. We'll do for I in range len mono So just iterate through all the samples in the original array And then if we have enough room to add echo afterwards So if I plus delay samples is less than the end of our array, then let's add some delay Let's do mono at I plus delay samples plus equals mono at I times volume So that's a little confusing to think about but let me give you a quick visual example to see what this means So let's play with a smaller array Maybe something where our delay is only one sample and let's say our signal looks like this We have six samples 0.0 0.1 and so on What we're essentially doing here is taking a copy of this array Multiplying it by volume and then shifting it forward by the number of samples we want to delay So we'll shift it forward like that add the results together and Then the result is essentially the original signal plus also a delayed copy. That's a little bit quieter But I'm still just showing you a bunch of numbers on the screen and a bunch of math What if we can hear this and the great thing about working with audio code is that you often can hear what you're doing And what it sounds like so let's listen to a signal before and after it's gone through these seven lines of code No delay sounds normal sounds like a little piano and now after going through this code Kind of cool, right with only this code that you see on the screen here We're able to take the actual audio signal you just heard and alter it in a way that our ears can make sense of right We've done something kind of neat there But let's not stop there. Let's do some even more extreme effects by writing some Python code So let's get rid of that and now let's load up a new file Let's load up something here called cool guitar wave, which is a cool little guitar sample Again, let's read in the audio file with f dot read will read all the frames in the file and just take the first channel Then let's make this guitar sound a little more extreme. Let's add some distortion to it So we'll set some parameters here like the amount of gain that we want to add here What's the unit on gain to be honest? I actually don't know here. This is just a multiplier We're going to multiply our signal by and the higher this number the more extreme it's going to sound Now we'll also set a amount of volume that we want to multiply this by so if we have a gain level of 200 We're going to knock this down by 90% and take only 10% of the original signals volume Because we're going to distort this it's going to sound pretty loud and Then just like before we're going to iterate sample by sample over the entire array here and change the value of each sample So here instead of doing any sort of delay or array offsets We're just going to modify every single sample in place and say mono at I is equal to math dot tan H of mono What I times gain times volume now you might be asking what is math dot tan H and that's for a different talk I'm not going to get into the math of this, but you'll be able to hear this in just a second So let's listen to cool guitar wave before our processing Sounds like a guitar now this is going to be a tiny bit louder as you can kind of see on the screen there But don't cover your ears. I don't think it should be too loud So that sounded a lot more metal a lot more High energy if you might say and all we had to do to make that sound is the code that you see on the screen there In fact, this is even less code than what we had to do to implement some delay So with very small amounts of code in Python, you can actually alter sounds and edit sounds like this very easily Okay, so I've talked about a couple cool things you can do here Let's talk about problems. You might run into working with audio and Python Python is great working with audio code is great But sometimes there are some rough edges to make it really difficult to work with Python code or with audio code in Python So let's start with well a similar example to what we had before So let's load up my favorite song done at B3 and let's say we want to do something different with this file We want to load it up and make it louder and then save it back to disk again So we'll do with audio file here We'll do audio equals f dot read f dot frames and then once we've read in the audio We'll make the entire thing louder. We'll make it louder by multiplying it by 2 which takes each sample value and multiplies by 2 And then we'll save the result back out to disk again We can use pedal board for this by doing with audio file out mp3 and passing the w flag as the second parameter And then passing in the sample rate because we need to know the sample rate in order to save And then we just do o dot right and like any other file. We've written that to disk But there's a problem here This code will probably run just fine in your laptop and it'll run just fine on all the test mp3s That you have or the small audio files that you have but what happens if the input is not something you control What if that audio file at the top is not actually a small song or a small mp3 file you have But something that a user gives to you or something that comes in from a web service or it's you know You're running this on a large catalog of audio What if that is two hours long? Well if that mp3 file is two hours long You might not really be able to tell that until your code crashes because once you read all the frames in and ends up Being 2.3 gigabytes of audio and this is something kind of deceptive about audio in general Is that because of audio compression and because how effective audio compression is You can actually have very small files that uncompress to massive massive amounts of audio in memory So there are very simple techniques we can use to get around this and one way to do this is by chunking the audio and Processing it in chunks instead. So let's try that here. Let's set a chunk size of 500,000 Let's open up our input audio file and let's open up our output audio file as well So we're kind of nesting these context managers really just so they'll fit on the screen I know you can put them all on one line and then as long as we have audio to read in so while f.tel is less than f.frame So as long as there's more audio coming in Let's read in a single chunk of audio Process just that chunk make it louder and then save just that chunk to our output file And now with this we no longer use two gigabytes of memory for a two hour long file We only ever use up to four megabytes of memory here And this input file could be two hours long or 20 hours long or it could be a stream that never ends and Our code is still going to work because we're processing in chunks instead So what I really want you to walk away with here is not Chunking or that exact line of code that you saw there, but that you should think of audio as a stream audio is really a stream It's streaming in time and time keeps going audio could keep going for as long as you want So if you write your code such that audio is a stream Your code will be more resistant to bugs and we'll probably not crash in production or at least not as often Okay, so let's look at one other common problem with audio and Python here Let's go back to the example we had with distortion that really cool guitar example So here's the code that we had before I'm not going to type it all out again But we read in our entire audio file and then we do math dot tan H at the bottom here Now I said I wasn't going to get into what tan H is and I'm not it's a mathematical function But we're looping over every single sample here in Python and this is fairly fast or at least it's fast enough If I run this on my laptop, it takes about eight seconds per minute of audio to run this code So that's a good way you can measure audio code is you can take how long it took to run and divide that by the amount of audio that you process through that code and Here we find out that this code runs in what's called about seven point five times real time So seven point five times faster than it would be to just play back the audio itself However, that still seems a little bit slow if I have to run this on a minute of audio I have to wait eight seconds, and I'm very impatient. So is there a way to do this any faster? Well, luckily Python has a lot of libraries and a very rich ecosystem So I found that you can use numpy the numerical computation library to do this exact same operation And instead of doing math dot tan H there, which only takes in a single sample at a time We could do numpy dot tan H which takes in an entire array at a time and then processes that through the same function The results are exactly the same in fact bite for bite. They're identical if the sample values are perfect but this code at the bottom runs a little bit faster and When I say a little bit faster, I mean it takes only about 23 milliseconds per minute of audio That's a great gasp. I'm glad that happened Yeah, 23 milliseconds per minute of audio So that's a little bit faster than the old code was in fact if we do a bit of math here and divide these two It's about 338 times faster to use a library here instead of manually iterating over the samples in our case we use numpy and numpy made this 338 times faster, but you could use pedal board or you could use really any other library that optimizes this away and The last thing I want to talk about here is really that for audio Pure Python is slow. I will not offending anybody with a statement But pure Python iterating through samples and doing direct math can be very very slow compared to native optimized C code essentially So if you can use third-party libraries instead Okay, and that brings me to my final section here, which is talking about audio effects in pedal board So we've talked about how to do some of these effects ourselves and how to make this Sound really good in pure Python But what if we use this cool library that I've been talking about to do some of these effects for us? I'm just gonna jump directly into the code here Just like we had before we're gonna read in an audio file and even though I just told you not to read in the whole file at Once I'm gonna do it here because otherwise it wouldn't fit on the slide and I'm going to use pedal board to import some audio effects So I'm gonna import an effect here called reverb So I'll do from pedal board import reverb and then we'll create an instance of this reverb effect We can do that with reverb equals capital R reverb and set the room size to certain amounts So room size equals zero point seven five whatever that number is. It's just between zero and one I don't think there's actually a unit on that, but it's good enough Then we can take our input signal here and pass it through our reverb plug-in So affected equals reverb pass in the audio and pass in the sample rate and all of a sudden We've applied an effect to the audio and it now sounds different, but you don't have to trust me on that You can actually listen to it. So let's listen to cool guitar way before and after once again And now with the reverb applied So musically that's the same, but you can tell that something has changed about that It sounds like it was played back in a large room or kind of a bit more boomy a bit bigger And that's a common effect that musicians will apply to their audio So let's keep going there. Let's get rid of the reverb example there and let's chain multiple effects together So instead of just applying one effect Let's make a an entire pedal board that contains multiple effects So here we'll do board equals pedal board Which is really just a list or a container for multiple effects and let's add a distortion effect first Followed by a delay effect Followed by some reverb on the end now This is actually very similar to what we did before in pure python But now we're just chaining these plugins into an object that's much easier to deal with Then just like before we'll take our audio signal and we'll pass it into the board So we'll do affected equals board of audio and f.sample rate and again We have a before and after comparison here. It's not the same sample This one's a little bit more tuned for the effects that we have here, but here's the before And then when we put this through some distortion delay and reverb Sounds very different in fact it sounds kind of musically interesting as well It's not just affecting the tone of the sound, but it's changing the musical content too Okay, I've talked about some guitar examples and stuff like that. What about other use cases? What if you're not using pedal board with music or you're not affecting your music here? What if you have a podcast that you want to process? So let's read in a podcast voice sample here We can make a pedal board that works on podcasts just like the guitar pedal boards. We've already done here So let's add a noise gate to get rid of some background noise Let's add a compressor to make quieter parts of the sound sound louder Let's add a low shelf filter which is really kind of equalizer or EQ and then we'll add some gain on the end to make the Entire thing a little bit louder Now with these four effects in series We'll take a voice that's kind of not really radio ready and turn that into a voice that sounds a much more professional Voice basically so we'll pass the audio in here and then again I'll give you a before and after and apologies It's gonna be my voice which you've heard already throughout this talk, but yeah here it is Welcome back to the podcast podcast where I talk about podcasts That's me recording my really cool podcast and then once it's gone through this pedal board here Welcome back to the podcast podcast where I talk about podcasts much more full much more Boomy arguably, but really sounds more radio ready compared to what I've got there And then my last example here my last two minutes are how you can use pedal board to load other plugins as well I've showed you a bunch of effects reverbs distortions things like that But if anybody in this room produces digital music or uses a computer to produce music You've probably heard of plugins or audio plugins VSTs audio units and things like that and pedal board lets you take all of those and put those directly in your code and call Those from Python so again, let's read in a guitar sample here and Read in the entire audio file into an array Then let's load a third-party plugin So I'm going to do plug-in equals load plug-in and I'll pass a path to a VST here So chowcenter dot VST 3 whoops, that's going too fast Okay We'll load a third-party plug-in here and that third-party plug-in will have code in it that we didn't write and in fact It's not even written in Python So it's code that you might have downloaded on the internet or purchased from somewhere else Because VSTs are things that you can buy anywhere So we'll load that plug-in and then we'll also do plug-in dot show editor Editor or show editor here will actually open up the plug-ins UI and Directly in Python you can play with the buttons and change the sliders and do all the sorts of stuff You might do in your digital audio workstation Then just like before we can run audio through it We'll do print plug-in dot gain to change certain parameters We'll set plug-in gain to 1.0 for maximum rock and then audio goes through and here's the foreign after one last time and afterwards And we couldn't have written that in Python. We're just gonna use someone else's code for it and it sounds pretty good So with that I'm out of time if you're interested in using pedal board you can find it on github at github.com Slash Spotify slash pedal board or just pi pi. It's available on pi pi. They're pip install pedal board Thanks so much for listening find me in the hallway for any questions You might have or for cool stickers of the nice pedal board logo. We've got there. Thanks. You're Python Nice timing Peter on the dot So everyone as Peter mentioned, we do not have time for Q&A But I'm sure there's a lot of questions in your mind. So Peter will have his Q&A at the hallway Line up if you have any questions for him. Can we have another round of applause for Peter and thank you for your talk