 I have the pleasure to introduce Werner van Bele. He studied computer science in the University of Brussels, and he got his PhD over there. And he has been working on this software, BPMGJ, for the last 16 years, and he will now introduce us to the wonders of this package. The audience is yours, Werner. Give him a big warm applause! Yeah. Thank you all for being here. I've just been running back and forth, so I still need to catch my breath. So before explaining the time stretcher in detail, I will talk you through the time stretcher if I have enough air in about three minutes. Then I will let you hear the time stretcher, and no, I will let you hear the time stretcher now, and then I'll talk you through it. So time stretching is a question. How can we modify the speed of sound without altering its pitch? And this is the result of what I will be explaining. I hope it's not too loud. The time stretcher itself is based on the sliding window for the air transform, which means that it will take overlapping segments from the input sound, analyze them, re-synthesize them, and place them in the output stream, which is at the bottom. The amount of time you stretch depends on the spacing between the input segments and the output segments. For instance, in this case, you have a smaller input stride than output stride, so which means you go slowly through the input. In this case, on the other hand, you go quickly through the input and generate them in the output at a faster pace, so you actually go faster than real-time. The magic, of course, happens between taking the input segment and re-synthesizing it, and this is done first by converting one segment to a spectrum by means of Fourier transform. This is such a spectrum. Horizontally, you have all the frequencies. Vertically, you see how much energy you have present. In such a spectrum, we will look at all the individual peaks and convert them and copy them to a small vector. This is one such a peak from the analyzed frame. Each peak is then multiplied with a time-stretched matrix C, and when we do that, we create a new peak, the synthesized peak, which is the blue one. Now, with time-stretching, the problem is if you generate a new wave, you want to be sure that it aligns properly with the previous wave you synthesized. To do that, you need to go back to the previous frame, find the nearest peak, and determine the phase correction term. First, find the nearest peak in the previous frame. We have now the analyzed peak of the previous frame, the analyzed peak of the current frame, and the synthesized peak of the previous frame, and the synthesized peak of the current frame. Between those four, we calculate the proper phase correction term to make sure that they are in alignment. Once we know the phase correction, we apply it to the synthesized peak and then add this peak to the synthesized frame, which once you have a full synthesized frame, you convert it back to the time domain. You fade in and fade out the volume again, and that's basically it. And this is everything a time-stretched does. Normally, I would show you the time-stretched now. You already heard it. Now we will restart this entire process from the beginning. And this entire time-stretched was actually born in the idea of sinusoidal modeling, that you can create a model of the sound, which is summation of sine waves. These are here. And each sine wave has a carrier frequency, which is the omega. It is allowed to change its volume over time, which is the G gain. And you allow the frequency of the carrier wave to change over time, which is the phi V. I don't know how to say it. The reason for this model, why this model looked attractive, is because if you look at the wave like this, we can't really time-stretched. If, on the other hand, we look at the wave like this, we can't time-stretched. In this case, we see a carrier wave. We see that the volume changes over time, which is the amplitude envelope, the blue line. And at the bottom, you have the frequency envelope, which shows you that the spacing in zero-crossings changes over time. So a frequency envelope is also a phase envelope, if you want. Well, for practical purpose. Now, if you consider a sound like this, you can actually time-stretch it by re-sampling the envelopes. The blue one is the re-sampled envelope of the red one. At the top, you get the amplitude envelope, which you time-stretched. At the bottom, the frequency envelope, which you time-stretched. And underneath this new time-stretched envelope, you create the old carrier wave. And if you do that, well, that was the idea, then you create a good time-stretcher. So the top is the time-stretched version. The bottom is the original version. You see it lasts longer. The envelope is preserved. And if you look per time unit, you have more or less the same amount of oscillations. So this talk is fairly complex, so I will go through a series of things you need to know to understand it. The first thing is complex numbers. Most people, if they think about audio, they think, let's create it as a sine wave or a cosine or so. But that's not really a good representation, because most oscillations rise from rotational things, more or less. And it's much easier to write it as an explicit rotation than it is to write it as a sine or a cosine. For instance, if I tell you the number is 15, then you do not know what the angle is, nor do you know what the amplitude was of the wave. If, on the other hand, I would say the point is 15, 15, then you know that the angle was 45 degrees and that the amplitude was a square root 2 multiplied with 15. So that is the reason we use complex numbers. And everything I will tell is always a complex domain. We don't talk about real numbers. A second good thing is you have Euler's relation, which states that e to the power of an angle multiplied with i is actually a point on the unit circle under angle phi in this case. So it's written literally cosine of phi plus the sine of phi, which is on the vertical axis. With this relation, you can express a frequency. You can easily say I would like to have so many rotations a second that you fill in the omega. And if t increases, omega t will increase as well. And g specifies what the amplitude is of the oscillation. So this is essentially easier than writing a sine or cosine. A thing we will use in this presentation is that the multiplication of two complex numbers is the same as multiplying their amplitudes but adding their angles. This is quite easy to see from the explanation. You have e to the power omega 1i multiplied with e to the power omega 2i is omega 1 and omega 2 all together. With this in mind, we now want to create a complex rotational model instead of a sinusoidal model. So we would like to express xt, which is the sound we would like the model as a summation of complex rotations. This means we just replace the sine with e to the power of the angle we originally had. Now, there are two things you can now do. The first is you can define a carrier wave, which is written as s, and an envelope, which is written as u. s is starting from angle 0. At a certain frequency omega, you run around a circle, or you run two times around a circle, or four or five or whatever you want. And the envelope u tells you how you modify this carrier wave over time. It contains both the gain modifications as well as the phase modifications. And if you define the envelope and the carrier like this, you can state, I would like to have a model of the sound, which is a summation of complex rotations consisting of a envelope, u multiplied with a carrier wave s. The angle of u is how you modify the frequency, and the amplitude of u is how loud you want this carrier wave. s itself has a certain carrier frequency, which is not explicit except here. So with the preliminaries in place, we can now discuss how we analyze the sound. After the analysis, we will synthesize the sound again. This is a voice print. From left to right, you get time. It's about 1.3 seconds. From bottom to top, you have 2,000 hertz. I cut it there. And you see if it's bright color, it has a lot of energy. If it's yellow, it has little energy. And what you now get is the idea that we would like to find a complex track, a sinusoidal track in this voice print. This consists of multiple phases. First, we would like to know how we measure the spectrum. Secondly, we will find how we find the peaks in the spectrum. And thirdly, we will track each peak through the spectrum, through the voice print to create such a sinusoidal track. But let's start with the beginning. This is a voice print. We would like to know how we calculate the spectrum at a certain position in time. And this is nothing more than spectral sampling. You have a signal x which you would like to sample, and you simply test the signal against a rotation of one complex rotation, or two rotations, or three rotations. This is a signal to be sampled. This is a sampler, currently one rotation. And what you get is that this has unity gain. So the result of this is basically the gain of x, and the angle of the mode is the difference between the angle of the sampled signal against the sampler. Basically, we measure continuously the phase difference between the sampler and the sampled signal. And in a sense, we average this, and this is basically how much energy we have for this one rotation, and under which angle we need to rotate the sampler to match the sampled signal. And this is something which can be repeated in many different rotations. For instance, one rotation, two rotations, three and so on. This is basically Fourier analysis. And if you write it out in complete form, you get that the Fourier analysis of a single x, which is really a sample, is the spectrum x written with a capital B specifies how many rotations you tested and how you measure it is this section, which is just what I explained. Fourier analysis is linear. If I take the Fourier analysis of a sound at a certain gain added to another sound, it is the same as gaining that first sound its spectrum and adding the second sound its spectrum to it. This is useful, we will need that. Fourier analysis can be inverted. You can see the Fourier, the inverse Fourier transform of my spectrum will eventually generate back my original sound. The only reason this works is because you have an even spacing here. If you don't have even spectral spacing, you cannot directly do that. Something most people think, the Fourier analysis gives you something of value. If you look at a sound, you do not really see the frequencies. The Fourier analysis gives you those frequencies. So many people think, if I take the Fourier transform over Fourier transform, I'll get even more of a value. That's not true. If you take the Fourier transform over Fourier transform, you get the original signal, but in reverse. You will basically reverse the sample. So this is an oddly cropped image. Well, in any case, this is a spectrum. The sound we put into the spectrum was 2.5 rotations. So we have around the circle 2.5 times and 13.1 times. And this is a signal you get at 2.5, which is here. You see some leakage through the side channel, so to speak. And at 13.1, which is here, you also see a signal near the position where you have your frequency. And this is something you have a grip on. In Fourier analysis, you can actually create a proper sampling of the spectrum that will keep most of the spectrum local to where the signal actually is. And at this point, I have a nice video I thought. I hope. So this is what happens in a normal sampling. The back is the wave we sampled, and from top to bottom, 0 to minus 120 decibels, you see the spectrum you measure, at least the amplitude. And you see that there is a lot of leakage. This is where the central peak is, and this is where, at minus 70 decibels, the noise floor, in a sense, is. And this is bad for two reasons. If I have a signal here, it will affect the rest of the spectrum, but also the other way around. If there is a lot of noise in the other parts of the spectrum, it will affect the local measurement. So this is something we really don't want. A solution is to fade in and fade out the sound appropriately. Like this. In this case, it's again the same scale. This is the wave we actually sample. And from top to bottom, you get 0 to minus 120 decibels. And you see that the leakage is limited to a local area around where the peak actually is. This entire process is called windowing. There is nothing new about this. It's just something you need to take care of if you create a time stretcher. So now we know how we sample the spectrum at a certain point in time. The next question is, how can we find the peaks? And what do they mean? Is there a deeper relation to these peaks? Yes. Oh, I hate to open up so much. Well, this triangle was meant to be located here, between the peaks. So the question now is, the peaks in the spectrum are easily found. You just find the local maximum. And that's all there is to it. The question is, how do you extract them? How can you say this is a position? For instance, M1 is between the two peaks. How do we assign this value to either left or right side? And it's easy. You just create a weight curve that will be the complement of the weight curve of the previous or next peak. In this case, we just draw a line from 0 to 1 and 1 from there to there, which is from 1 to 0. And the other peak will do the complementary thing. And this basically assures us that if we extract all peaks, we do no time stretch and we put them together that we have exactly the same spectrum as before. And you can draw a lot of things. You can draw this weird curve, a parabola through the minima. If you want, this works well, or you could try this. And this doesn't work. It's something that you might think, I really truncate my spectrum from here to there. This is my peak and all the rest belongs to the other peaks. And that doesn't work. It's really, really something that sounds bad. So you need to be soft in a time in the spectrum, in a sense. And it's not a problem to create that, of course. Now, to explain what, if you extract such a peak from the spectrum, you would like to know what it means. And to explain that, I need to explain convolution. Convolution is the idea that you have two signals, a red signal and a blue signal, a green signal, apparently. To create the convolution of the two signals, you move the green signal to each position in the red signal. You multiply it with the value you find at that position in the red signal. And all those time shifted signals are added together, which gives rise to this signal. A normal implementation, or rather a simple implementation will take about n squared operations. For the analysis helps us in that sense. We can say that the Fourier convolution theorem states that the spectrum of the first signal, multiplied with the spectrum of the second signal, is a spectrum of the convolution of the two signals. So, how do we apply that? Well, we have the first signal, x. We go to the spectrum, which is this thing at the bottom. This is f of x. You take the second signal, you go to the spectrum, which is the spectrum of y at the bottom again. You multiply the two spectra element twice with each other. And remember, these are all complex multiplications. I'm no longer talking about real domains. And this one is the thing you do the reverse transform of, which gives you the original signal, at least the convolution of x and y. So, this was meant to be a gradual explanation of the things while I'll talk you through this. The thing which was a breakthrough for me, and it's also common knowledge apparently, I didn't know that, is that you can also do spectral convolution. You can say that if I have the spectrum of one sound, the spectrum of the second sound, I convolve them. Well, it is the same as taking the spectrum of the multiplication of the original sounds with each other. I'm now pointing to this area here. So, the convolution of two spectra is the spectrum of the underlying signals multiplied with each other. To prove this, there it is. First of all, you take the Fourier transform of both sides. So, this gives the Fourier transform of the left-hand side here to Fourier transform of this thing. This is Fourier transform of a convolution, which we know the convolution theorem states that the Fourier transform of a convolution is the Fourier transform of the first element multiplied with the Fourier transform of the second element. So, we get the Fourier transform of the Fourier transform multiplied with another double Fourier transform. And here we already had it. In the beginning, I said something about double Fourier transforms being the reverse of the underlying signal. This is what we see here as well. This is the reverse of x multiplied with y, and this is the multiplication of x reversed with y reversed, and these are obviously the same. So, that's a very easy proof that this is actually true. How do we apply that? This is a spectrum, the red one. We have a green spectrum. The red spectrum will first be converted to a time domain, which gives you this curve. And let's call this the carrier wave. We take the green signal, which is also a spectrum. We go back to the time domain. So, we take the inverse Fourier transform, which gives us y, and let's call this the envelope. And now we multiply the envelope with the carrier wave, or essentially we modulate them with each other, which gives us the amplitude modulated wave. And then we go back to the spectrum, which gives us the convolution of the two peaks. So, in this, well, not in a sense, quite exactly actually, the envelope and the carrier wave are together convolved in one peak. It basically means that both the envelope and the carrier wave are local phenomena in the spectrum. And it also works for phase modulation. At the top, you have a phase plot, time versus phase from minus pi to pi. At the bottom, you have its spectrum. The green one has six oscillations. It goes six times around the circle from zero to six times zero again there. The red one has half an oscillation, zero to half an oscillation to pi. And remember, multiplication of complex numbers is adding their angles. So, if we multiply the red and the green signal, we will end up with 6.5 rotations. If we look at their spectra, we find the peak for the six oscillations at bin six. That's here. The half oscillation is between zero and one and a lot of leakage. And the convolution of the two is the blue signal. So, what does it mean if we look at the spectrum? This is a spectrum of a piece of sound shown in red at the bottom. The top is a spectrum. The bottom is time domain. If we extract the red at the black peak, this is actually a convolution of the spectrum of the envelope with a spectrum of the carrier wave. If we go to the time domain by simply taking the inverse Fourier transform, you get the modulation of the envelope with the carrier wave. And you see that the red wave, the black wave, follows the red one quite exactly. It has an amplitude modulation. It starts at zero, goes up and goes to zero again. And here you see that the phase is slowed down and pick up speed again. So everything we want is actually in the spectrum at a very local scale. Now, back to the model we would like to create. It's clear we will use a spectrum, so we take a Fourier transform of the sound. That means that we should also take the Fourier transform of the model we are creating, which is this one. Linearity allows us to move the Fourier transform to the inside, which gives you this equation, the sum of Fourier transforms. And that thing is a Fourier transform of three modulated signals, which is the same as convolving at the spectra of the individual signals. We know that the carrier wave and the windowing function together is at most eight bins. You don't go further than that. And we also know that the envelope spectrum is local. It is band limited. It is smaller than 20 hertz, which is more or less two or three bins in a normal spectrum. The question is, why do I say that the envelope is band limited? And I hope this works. This is a demonstration. The sound you will now hear is 200 or 400 hertz or so, and I will modulate the sound first one times a second and two times, three times, and I will increase the modulation. And as the modulation increases at a certain point, somewhere between 10 and 20 hertz, you will start to hear two sounds, one going up and the other going down. This basically means that if you hear two sounds, we should treat them as two separate sounds and no longer as one sound. And that is the reason we say it's band limited. This is one hertz modulation. So the sound, its volume, goes up and down one time a second. So this is an audible demonstration why the envelope spectrum is something local. We do not want an envelope that is modulated 50 times a second. We would hear two tones instead. So at this point, we know the voice print, we know how to analyze one slice of the voice print, we know where a peak is, we actually know what it means. The next question is, how can we follow this peak from frame to frame? And this is very simple. We just go to the next frame and find the nearest peak. That's all there is to it. Well, there is something more to it. You can make more advanced approaches, but this is the thing that worked for us and it works quite well. If we do that, you can convert it to the time domain, which is at the bottom, and actually stitch together the entire complex rotation for this particular track. And this is what you get. The bottom was all the segments pieced together. The top one is actually the complex track, which means that we do have the amplitude envelope as well as its frequency envelope. And the idea was, of course, at this point to stretch the envelopes, but retain the carrier wave. I will now let you hear how such a track sounds like. I hope it. This is from our demo sound, the sound you heard in the beginning. Five tracks, only three tracks are shown because I cut it at 2.5 kHz. This is 1% of the tracks. It looks like that. This is 5% of the tracks. And this was the original track, and it is exactly the same as the original because we designed the peak extraction to be a one-to-one relation if we don't time-stretch. So, we now know how to analyse sound. We know how to find the complex tracks. We know where the envelope is. We know what we want to do now. The thing is we need to do this, of course. The overview again. We have four peaks, or four and a half peaks here. They represent a small segment of a complex rotational track. And if we stitch them all together, you get the actual single track through the sound. The idea is to time-stretch the envelopes, the green and the blue one, simply by re-sampling them and generate the original carrier wave underneath the re-sampled envelopes. Of course, if we would do this at this level, you would have around 1000 tracks in a normal piece of sound. And at 44.1 kHz, it's way too much computation you have to perform to actually do this in an explicit way. Therefore, we will stick to... We will go as close as we can to the original peak. So we will instead... We will look at this small fragment and we will look at this small piece and try to time-stretch that individually. And this is done by... We will do it as follows. We will try to determine the envelope of each small segment and then time-stretch them by pushing them outwards or pulling them inwards. How this is done is as follows. You start out with a peak which is the convolution of the enveloped spectrum with the carrier wave spectrum which is this thing. First, we demodulate it. We remove the carrier wave which gives you the spectrum of the envelope. You convert it to the time domain by an inverse Fourier transform which gives you the time domain representation of the envelope. We re-sample it, stretch it or shrink it. That gives us the synthesized envelope. Then we go forward, generate the synthesized peak, the spectrum of the synthesized envelope and then we remodulate it by adding the original carrier wave to it. Demodulation is simple. You have a peak at 9 oscillations and you say, let's assume the carrier wave is at 9 oscillations. The original frame size might be 4096. It doesn't really matter. If we move the peak to position 0 we get rid of every oscillation that's in there. Well, most oscillations are in there. Essentially, by shifting it to position 0 you remove the carrier wave. It is not exact, it's not perfect. It is possible that the carrier wave is actually between two bins and there are ways to deal with this by actually estimating the frequency and it's no problem to add this to the time stretcher. In order to explain it I will however assume that it is actually at bin accurate position. So now we have the spectrum of the envelope. We go now to the time domain by performing inverse Fourier transform which gives us the time domain representation of the envelope of the small segment of this individual track. We can show the amplitude which is this one, which is both a window combined with the actual volume change in the envelope in the wave as well as its angle which is pushed forward throughout the entire frame so it means we didn't really estimate the frequency that correctly. Now we will re-sample that we have a higher resolution of this which we need in order to push the envelope outward or pull it inward. If you push it outward you take the middle of the frame you push all the samples outward which gives you this one. The red one is the amplitude envelope the blue one is the face envelope. If you shrink time you take the middle line and you pull everything to the middle which gives you this curve. Again the window is now smaller that's the red line and the blue line is also the face envelope which is also smaller. The equation to do this I'm not going to explain this but generally the synthesized envelope at position T is the analyzed envelope which is U at the source position depending on the target position we are synthesizing and this mapping from a target position to source position is given by this equation. Important here is that T is an integer and ST is not an integer which is the reason we had to re-sample the envelope to a higher resolution. Once we have the synthesized envelope U' we perform a forward Fourier transform which gives us the synthesized peak U' and then we perform a forward Fourier then we actually remodulated with the original carrier wave which was basically at position 9 so we shift the peak back to position 9 which gives you the convolution of the spectrum of the synthesized peak with the spectrum of the original carrier wave. And if we do that we get the following time stretcher. This is time stretcher you see the segments so that didn't sound too well and the reason for that is that you have to correct the phases and this is a very tedious part of this presentation and I really apologize for that but I thought this is a technical public they want to know the details and here you are going to get them. The problem is this is the segments we just generated we have a red segment, a yellow segment a greenish segment and a blue segment and if you do not correct the phases you see that the yellow one goes in opposite direction of the red one so the phases are not in alignment they are actually entirely the opposite and this means that the sound will cancel itself out for certain frequencies. Why does this happen? The red one is the previous frame the previous segment of one individual track the blue one is the current segment of one individual track at the top you see the amplitude obviously the blue one picks up where the red one leaves off so this goes down the blue one follows that and just goes a bit further if you look at the phase track you find that the phases of the red the previous segment are in alignment with the phases of the blue segment this one picks up nicely where the red one leaves off it has nine rotations one, two, three, four up to nine there and the blue one has also around nine rotations and they are compatible in a sense if we now take the blue segment and move it forward in time this is what you get the amplitude envelope at the top it's still more or less compatible you might have some modulation but this is something you can deal with easily the problem is this one the phases are now almost 90 degrees out of phase and this leads to the change in volume the idea now is in order to correct this we would like to restore the phase difference we found in the original signal and restore it in the synthesized signal that of course means that we must be able to talk about the same position in the analyzed signal and this is done by talking by creating two positions in each frame the entry position and the exit position and they are designed in such a way that if the frame goes from the previous to the current that the current entry position is referring to exactly the same input position as the previous frame its exit position it's very fun to talk through this but this is what you see this is the entry position of the previous frame the exit position of the previous frame if we look at the current frame which is the blue one this is the entry position and this is the exit position and both entry and exit positions overlap there so at that point we know we are talking about exactly the same input sample and of course at that position you can go to the phase track and measure the phase difference there now restoring this phase difference in the synthesized track is not as straightforward as you might wish because the exit position and the entry positions they do no longer overlap they no longer refer to the same position to the same position in the output stream so the best you can do is to find the middle ground and define two new positions that are as close as possible to the original analysis positions it basically means we also have a synthesis entry and a synthesis exit position the colors are totally off I have no idea why so we have an analysis entry position exit position, synthesis entry position and synthesis exit position and to make talking about this easier I will no longer refer to either entry or exit position if I talk about the previous position if I talk about the previous frame I always talk about the exit position if I talk about the current frame I always talk about the entry position essentially I always talk about the same absolute position in either analysis or synthesis stream. What is this slide? Well, okay. Now, the phase correction. The top is the analysis, the bottom is the synthesis. The first thing we do is we measure the phase, basically the angle of the wave we are currently synthesizing, and we remove it. We don't want to have that. So the phase correction starts out with the removal of the current synthesized wave, its angle. That's this core synth. Then we go to the previous frame and start out from the angle we had in the previous frame at that position. So we add the previous synthesized wave, its angle. And to that one, we add the phase difference between the analyzed positions, which is basically the current analyzed wave, its angle, subtracted with the previous analyzed wave, its angle. Essentially the phase difference. Now, of course, the problem is we do not know the angle of the current synthesized wave immediately. We just don't have that because we demodulated the peak. If we would look at the modulated wave, then you would get a phase plot like this. Horizontally you have time, vertically you have the phase. You see how they pick up on each other. If we demodulate this wave, you get a track at the bottom. You only get the envelope, its angle. And that's nowhere near the same as the combined angles. The reason for that is straightforward. This is one of these things about complex numbers. If you multiply them, you actually add their angles. The multiplication of you with S, the envelope with the carrier wave, that angle is the same as the addition of the angles of the envelope with the angle of the carrier wave. And if we know that we can, in a straightforward way, split each of those terms into extra terms. Essentially you get that the phase correction, I'm not going to read this, no worries. Let's just take the current synthesized wave, its angle, which is this one. This is split into the current synthesized envelope, its angle, other to the current synthesized carrier wave, its angle. And this is done for all of the terms in the phase correction. Let's investigate this now. The carrier wave, its angle, is something we can actually compute. Yeah. We will now be dealing with this particular section of the equation. So this part, the right part, is the thing we now discuss. The angle of the carrier wave is how many rotations you have multiplied with a relative position in the frame. If I'm at the beginning, I always have an angle of zero. If I'm at the end of the frame, I always have, if I had nine rotations, I have nine rotations at the end. If we apply that, you get that the entire carrier wave phase correction is essentially expanded to this thing. You see the entry and exit positions mentioned, that is the relative position in the frame. The two pi states we want full rotations and the peak positions are present as well. If we know that the exit position is one minus the entry position, that whole rotations can be removed and we reorder the terms, you get something like this thing. Two pi multiplied with a difference in spacing between the input and the output segments and the frequency we had in both the previous and the current peak. We add it to the phase correction term, which is at the bottom, which leaves us with the problem of the envelope, their angles. These are things we can measure. We know the time domain version of the envelope. We can simply measure it in the envelope. The good thing is on this technique is you don't need to do that. So this is again the phase plot analysis synthesis. Previous one is red. The green one is the current one. The analysis phase, if we do not time stretch the envelope, we have the following. The previous one is mapped straight down to the synthesis one, like that. The envelope of the green one, the current frame, is mapped somewhat further down in time, which leads to that position. Obviously, at this point, you cannot say that the synthesis envelope is the same as the analysis envelope. They just don't match. They don't go to the same position. This is different if we time stretch the envelope. In that case, you get the red one, which goes to those positions. The green one actually goes to exactly the same positions. This means, essentially, that we can say that the synthesis envelope, its angle, is the same as the analysis envelope, its angle. This means we can remove those terms from the equation entirely. There is no longer a need to measure the envelope phases at all. Of course, if you do that, then you have a phase correction. You apply it to the peak, you're just synthesized. This is something which carries forward in time. If I apply a phase correction to this peak, in the next frame, I have to apply the same phase correction, so you want to push this forward. This is done by applying a previous phase correction term and adding it to the current phase correction term. You get the current phase correction is the previous phase correction, other to the phase correction term necessary for the carrier waves. This is how it sounds, if you do it more or less correct. I played it so slow because then you can actually see that the segments are actually in alignment. At this point, I would like to start talking about the results, how well it performs, but before I do so, I need to explain how you can actually compute all this fast. The idea is we don't want to go to the time domain version of the envelope. This is something we can do. We know that the synthesized envelope is the analyzed envelope at a certain source position. We know that T is an integer. We also know that ST is not an integer. That was the reason we had to resample the envelope. This resampling is something you can do straight away from the spectrum of the signal. Basically, you have the inverse Fourier transform and you put in a non-integer K there. If you do that for each T, you get a series of coefficients that associate with each of the input bins. Basically, you have a matrix C, which you multiply with the spectrum of the input peak. This is directly the synthesized time domain version of the envelope. This is how the matrices look like. At 133%, 101%, 66%. Of course, there's no reason to stop now. You can go further. You can also say, let's take the Fourier transform of this thing. The Fourier transform of U prime is, of course, the synthesized peak we want. You compute it by specifying that the time stretch matrix C is actually a multiplication of column vectors, which is this thing, with bins from the input spectrum. Then you take the Fourier transform of both sides, which gives you the synthesized peak directly with the Fourier transform of that column vector multiplied with the value of the input peak. Calculating this Fourier transform was really fun. I spent two days on that and eventually I typed it in Matlab, which gave me an answer which was kind of annoying. It is a sync function. This is the thing you expect if you want to resample things, but even if you know it, you cannot write it down like that straight away. This time stretch matrix allows us to say that the synthesized peak is a time stretch matrix C multiplied with the input peak. This is how the matrices look like. They are very sparse. That's very nice. We can perform the matrix multiplication very quickly. The support for the matrix can be calculated quite easily. It's essentially a line. The results. The time stretcher uses a window size of 4096. It has an overlap of four. Most tracks have around 1,000 to 100 peaks. This is a performance estimate, so it's a bit silly, but on my computer I can do 22 tracks in real time, which doesn't mean a thing, of course, except that it's not that slow. It has a comfortable range between 66 and 150%. If you go further than that, you start to hear a metallic sound, which is because if a sound contains noise, this noise is not correlated. If you then suddenly time stretch it, you make it correlate, which is basically the sound of a hall of a reverb. This is something which we can solve, but I said I would be talking about this time stretcher, which I'm doing. The second problem was a bit of a disappointment for me. The transients. The reason I wanted to time stretch the envelopes was to preserve the transients. I thought if there is a beat, I would like to have an envelope that models this beat and then preserve this envelope that the beat is still very sharp. That's not working very well. It doesn't sound bad. It all smears out. That's the only thing. You can actually show this. This is one segment that I will show you. The input is red, the output is blue. Currently, there is no time stretching going on. This is one segment. That's all there is. I will show you how this one segment is time-stretched. If I time stretch it, the envelopes go outward. Indeed, you find that the input is there. This thing is smeared out to the output. This is exactly what I wanted to avoid. It actually didn't work as I expected or hoped it would work. If I shrink time, it goes the other way. You see the front, which is clearly here, is no longer a real front. It's a smear of sound. Where am I? The last thing which is useful to discuss about this time stretcher is the envelope stretching has an impact. It makes sure that the sound stays true to its original volume, which is something you don't have if you don't perform envelope stretching. The effect, however, is only about rediscibles. If you compare the sound and measure the size of the envelope, wait, if you take a sound and you measure how much energy is in the envelope compared to the carrier wave, it's about rediscibles. It's really only a change in volume. That change in volume is something you only hear in the low frequencies. That's basically the entire talk. I hope you found it interesting. That's my email address if you're interested. The demo track I used is a Creative Commons. I have no idea what the buy is and see as a share alike, I guess. The paper is online. There is some more detail in the paper. There are also some omissions which I figured out when making the slides. The time stretcher is used in BPM DJ. I think there is still some time to demonstrate that. Five minutes to be exact. This is the software in which it's used. This is a beat graph. It's very quickly one track from left to right. You have all the measures, all the bars and from top to bottom you get the energy. If I play this, you get one, two, three, four. You see the yellow one follows the beat of the bar. If you want to time stretch this, you go to either the time stretcher here and slow it down. A thing which is also possible in the software is to associate with markers in the song certain speeds. You can say I would like to play the track slowly and then pick up speed while the mix is progressing. In this case, at this point, we would like to enter it at 50% and be at 100% at this point. And that was basically the entire reason I made this time stretcher to make this work properly. That's the talk. So thank you for your attention. Thank you very much Werder. What a fantastic talk and a software and excellent timing. So we have some minutes for questions. Yes, please go ahead. Yes, thanks for your very clear presentation. Can you get closer to the mic please? Yes, thanks for your very clear presentation. Can you also use the sinusoidal model of the stereo track that you created to optimize multiple tracks for basically the power optimization problem of mastering so that you tune the phase so that you can get the maximum voltage to voltage rail on the output signal? I'm trying to fill in the blanks in the things I didn't understand. I didn't understand most of it. Can you also use the actually correcting for example phase differences into different tracks to find what would be the optimal way to modulate two tracks we have to be played together to optimize the power of the resulting tracks? That's a very good question. This is something I plan to do basically mix two songs and make sure you have as much energy left in the mix of the two songs. That's basically what you're saying, right? Yeah, well I didn't do that yet because I can't do everything. It will imply that you will have to modify the frequencies of certain tracks that are in collision. Well it's not a bad thing if you have 100 hertz and 101 hertz and you try to mix them you get a modulation of one hertz and this is something you would like to avoid if you mix two tracks so I'm certainly sure you can do that to do that. I mean I'm certainly sure you can use this technique to do what you describe. I was wondering if you do this phase correction how do you avoid it to phase sort of drifts when you keep adding them and does that and also what does it matter if you use different windows for your smoothing? Yeah these are two questions. The first question is what if you keep on adding phase corrections to each other and this is not a problem. Phase corrections only go around the circle. At two pi they are back at normal so even if you time-stretch each sinusoidal track will eventually come back fairly quickly to its normal phase. If you for instance have three pi it's exactly the same as one pi if you will apply it to a phase as a phase correction. The second part of your question was what was it? Yes the windowing function. Yeah I tried a lot of windowing function and it does matter. The hand window which I just showed that's fairly optimal because it makes sure that you do not have discontinuity at the edge and this means you do not have peaks that shouldn't be there. Kaiser-Bussell is another very useful and very often used frame. It's not particularly much better as such and if you want to estimate the frequency very correctly you would like to have a Gaussian window but in the end the most important thing is a phase correction. It is not the window as such. Thank you very much Werner. I gave him a warm applause.