 going to be heading. So to kind of begin with that, my background largely comes from the domain of music. I've been working in that area, music technology area, for quite some time, also video and audio signal processing kind of areas over the past about nearly 15 years. So and I was super thrilled to find the low latency audio support in browsers coming up around sometimes 2011 if I recall. And given that my first exposure to programming itself was through making noises on machines, this was super thrilling that you could just launch a browser and make noises with it. So I'll see if I can relive some of that thrill through this talk. So my agenda is going to be fairly simple. I'm going to run through a personal history of noise making on computers. So why I say that this is, originally I intended to make this a kind of a whirlwind tour of computer music, but I cannot possibly do justice to that. So I decided to make this a personal version of it. So how my engagement itself began with audio and computers. And then dive into the Web Audio API's architecture and conventional architectures in computer music systems. And we look at some of the primary gorgeous that people run into when they're first faced with the Web Audio API. And how we can use a few techniques, simple techniques to solve them. One of them being the dynamism of the single flow graphs and then the intricacies of just-in-time scheduling of audio events. And this stuff is evolving, the spec is not finalized just yet and there are some nice goodies in the, I mean just waiting to spring on us. So I'll talk a bit about them towards the end and all of this goes with demos. So no fun talking about audio without sound. So this, as I mentioned, I just want to give a personalized history of my engagement with computers and noise making. So in the beginning, my very first thing was the screen going blank. So when I first started, when I was first exposed to computers, just about every machine was capable of making some interesting sound or the other and that was fascinating to begin with. And that played a large role in how I approached and what got me into computing in the first place. So it's pretty important for me. Just to keep a bit of context around that, the Seymour Papert, this work done by Seymour Papert was even before I was born, so that's where that begins. So he formulated this extension to this notion of constructivist learning where he believed that the true process of learning involved making things where the by-product of something is making something, that's a by-product of learning has to be making some tangible product. So he made this turtle graphics system which is a robot that kids control by writing typing commands such as forward 50, left 30 degrees kind of very simple commands and then kids went on to master geometry through this without being taught about the rules of Euclidean geometry. That's a phenomenal work at that time. And his key thing is that we need to have this kind of manipulative material and the early machines that we had provided this kind of manipulative material. Anybody recognize this machine here? Sinclair ZX Spectrum, I think it's probably older than most folks. So this, even this, this is a home computer, so it looks like a keyboard but it's an entire computer in that two megahertz processor, 128 kb memory if you're lucky. And you plug this into the TV, into your TV, turn it on and you've got a prompt. You can type basic commands at the prompt and you could type this beep duration pitch and it would make a noise. Like today's computers fail by this benchmark in my opinion. The extensive boot process that you have to go through without, before you can begin to make something interesting out of the machine is just so appalling. So this ZX Spectrum had this beep command that was like really cool. This is the BBC Micro. This had more extensive sound generating routines. It had a better audio subsystem there and you could, you could do a variety of noises with there. There were noise generators in there as well as pitch tone making routines as well. So that was a sound command that you could give to these machines to make these sounds. Even DOS GW Basic had the play command where you can, you could give it a string that's constructed out of like letters of, letters representing notes and durations of these notes. You can play that in sequence and it will wait till that playback finishes. So this, this was even before GUI's came into the mix. What happened to all of that? Like after for about almost, I don't know, about eight years or something, the ability to make quick sounds with our machines just vanished, it just went into some kind of a black hole. Today that stuff is turning around. So that's the, that's what I want to highlight now. So before the Web Audio API, tough luck. So if I were to give my machine to my son and I, I expect him to some, somehow produce some interesting sounds of his own construction, I'd be out of luck before this. But now what's happened is that we have, if Chrome, if you're lucky enough to have Chrome installed on your machine by default when you write, so you just launch Chrome, go into this into the console and just you can type commands that make noises, very interesting and high fidelity and low latency noises. That's fabulous. So let me look at what we, what it would take to do this beep in, in the Web Audio API and we just use this as our central piece for today, for today's discussion. So I'm going to pull up. So this, I'll walk you through this code a little bit. So yeah, it's not a one liner by any means, right? But it's got things, at least you can write a function to do the beep that the Sinclair that I expect them was good at doing with just out of the box. This is a, and it's pretty straightforward to achieve this because the API is organized in a very, in a very familiar signal, signal flow graph kind of structure. So I'm going to just go ahead and make that beep now. And I'm always thrilled when this happens. Every time I do this, I get goosebumps in some sense. So there you go. That's a beep function. And I hope some sound comes out of this. 330 hertz and I want it for two seconds. Okay. Atone comes up and I can modify that. Six, sorry. Fonts, okay. There goes beep. So at the heart of this beep is a very simple structure. Which is a graph that goes like this. So you've got a sort with wave generator that's being plugged into some kind of a game that controls this volume. And you plug that into the filter that reduces some of the highs and makes the sound a bit smooth. And then you check it out into the audio output. So this ability to generate audio using such signal flow graphs is kind of a milestone leap in the audio capabilities of the browser. So this doesn't come out of the blue. It's got a long precedent in computer music history. So I'll share some examples of where this stuff comes from. So that you get an idea of how this material is organized. I mean, how this API itself is structured and organized. So CSound was one of the very early and very influential systems in computer music. Was developed sometime around 1986. And it itself based on an older program called Music M. It introduced some concepts that are persistent to date. So for example, the instrument and score split. So you see there's a section that defines what your instruments are. And there's a section that's orchestrating these instruments. So the instrument specification talks about, okay, and I'm just doing a simple oscillator from a sine wave table. And I'm taking that signal to the A1 and I'm outputting it to the audio file. So in this case, it's an offline processing. So, and it also introduced the concepts of control rate versus audio rate distinction. So audio sampling rates are generally very high, like 48,000 hertz, 48,000 samples per second. But you don't necessarily want to control these sounds at that same rate, either for computational reasons or you just don't need that kind of fidelity. So there's a distinction between control rate or K rate and the audio rate. This has been borrowed into subsequent systems as well. So this is a pretty influential system. And this is Max. This is also since 86, it was developed by Miller Pocket and further developed by David Zikareli. It's now a commercial product and it's one of the staple programs that it gets used by a large section of the computer music community. Every audio device that you can buy probably has some kind of a plug-in that will patch into Max. So it's a data flow kind of paradigm where you do all your programming visually and the whole system is live. And it supports both control signals as well as audio signals. And these days, Max has even more capabilities. It can also do visual processing and graphics and more rich media stuff like that. But this is pure data. So Miller Pocket made an open source version of the same Max-like system. And this is one of the very dense screenshots of what's called a net PD in action. This kind of cockpit-like interfaces are fairly common in the computer music community, so it's pretty interesting to always look at them, the degree of control that innovations demand. So this is since 1990, so it's pretty old already. The reason I put this screenshot up is that these systems were not only being used for making noises, they were also being used for making interfaces to control those noises. So this is Super Collider. It's a client server kind of system where the server does the signal flow graph rendering. And there's a client-side language, pretty general-purpose multi-paradigm language, that's used to control what's happening on this synth server. And even Super Collider also has routines for putting up interfaces and synchronizing the commands that you send via these interfaces to the server. That's 96. But this is before the Web Audio API, I did some of my own synthesis work in Super Collider, and this is an example. You don't need to read the code here, but just the gist of it is that there are two sections. One is trying to synthesize a Vena tone out of it, and the second one is trying to control that tone to make kamakas in kinetic music, so that was part of my work before the Web Audio API. But now I don't need to do this because I can do all of this stuff in the browser itself. This Chuck, since 2003, Chuck brought an explicit model of time into the programming language itself, and even the language looks sort of like a wiring diagram. So you've got a sine wave oscillator, M, that's modulating sine oscillator, C, and that's being pumped into the DAC. DAC is digital to analog converter, so that's the audio output. And so the right arrow thing is generally referred to as Chuck, so sine or M Chuck to sine or C Chuck to DAC, so that's how you read the code in this case. So there's an explicit model of time, so time is being progressed in sections of one second in this case. So the notion of a signal flow graph is very common across all of these systems right from 1986 to today, and the Web Audio API builds on top of this. But what it does, while other systems also had to build these user interfaces and other supporting systems into them for them to be useful, like Pure Data and Super Collider, the Web Audio API, of course, does much less. It just handles the audio part, but leverages everything else that we have in the browser today, and that's very powerful. So it changes a lot of things for us, because we now have a kind of a cross-platform audio programming environment, and I'm going to share a demo of a system that's called the Rhythm Engine. It was developed originally in the late 90s, around 96, 97, by my supervisor, Peter Kellogg, and it was developed for Windows, and it ran on the old Windows machines and output MIDI, and it controlled drum sounds on the MIDI. And this, once, I rewrote the whole system using the Web Audio API so that it's now accessible via browsers where it's just a single click, and that's the kind of preservation of ideas, and there's some important ideas behind the Rhythm Engine that I'll just talk about shortly. I'm going to switch to my demo there. Is it clear enough? So the Rhythm Engine's core idea is what if a rhythm can be represented as a point in a multi-dimensional space, where each of the axes is some kind of perceptually continuous thing. By that, what I mean is there is some kind of more of straightness or more of offbeat in one direction or more of phase shift or more or less of various parameters. So if we represent a particular rhythm in this kind of multi-dimensional space, then we can do very interesting things with that. So in this case, I just start by playing one, one just one drum. So choose my kit. This is a very simple beat, right? So this is what's called a straight beat. So you will find that in rock quite often. So I'm going to switch that. I'm going to reduce its straightness for a little bit. It's just going to become slower. So that's a completely different kind of feel, right? So, and what you can do with this, you can also play around with a threshold to reduce its complexity, right? You see how it slowly shifts from one kind of intensity to the other. So it uses accents to achieve that. So this, no, that's just one voice. Now we can add more voices, but instead of going through all of that manually, what I'm just going to do is I'm going to drag and drop a preset file and so that I can show you what we can do with that. So this is, there are four voices in this case and I'll talk you through this. So this is a very simple beat. So what I've done is I've created four presets out of these four voices that have totally different characteristics and not totally different characteristics, somewhat different characteristics. And I place those presets into this space and how I can smoothly morph between them. So by representing all of these four rhythms in this kind of continuous space, I can now move between them smoothly. So this, it was possible to bring this kind of a live interaction onto the web primarily only after the Web Audio API came into existence. So we'll revisit the beep a little bit at this point. So there are lots of issues with this if you might notice, like how am I doing on time? That's okay. So this, you'll see that it creates an oscillator, but it doesn't return anything. So how is it that, how do we know that the sound is going to persist till the sound finishes? For example, why, I mean, what if the garbage collector kicks in and destroys all of them because all these references are going away? This is something that the Web Audio API takes care of for you, it maintains the references to these nodes in the vendor and the background until they are no longer needed. So one of the issues with this is that the oscillator node in particular and the source nodes in the API are what are called ephemeral nodes. So they are one shot. You trigger them once and after you stop them, you cannot use them again. They have to be reclaimed and you have to create a new item. This stumps a lot of people who are initially coming to the Web Audio API. So what happens is this, so when we did the beep 332.0, this is what's set up for us. And after two seconds, nothing. So this whole thing just disappears after two seconds if we no longer hold any references to the oscillator. But even holding references to the oscillators of no help because you cannot start it again, if you start it again, it'll throw an exception by specification. So how do we make the max kind of interface for Web Audio API if the nodes keep disappearing, disappearing willingly, right? So that's an issue. This, what we can do as with most of the problems in computer science, we can add a level of indirection. We can create a gain node to which all these processing elements are connecting instead and pipe that to the output. Now when we do that, what happens after two seconds would be something like this. You'd get something, you'd at least have that gain node intact. But after two seconds, you're free to beep again. And you can actually beep as many times as you want even before the two seconds finishes because they'll all get mixed into the same gain node anyway. So this is one approach that we can use to create stable illusion, an illusion of stable models. So in a sense what we are doing is that we are letting the Web Audio API be the server component of what I showed in Super Collider. And we are making use of JavaScript to just orchestrate trigger sounds, letting them die on their own and basically making noise. So the 2.0 is an important thing here. So the sound dies precisely two seconds afterwards. But precisely what I mean is the precision is like one sample. So it's sample accurate timing is provided by the Web Audio API. It's precision of like typically one out of one by 48,000 of a second if that's your sampling rate. The second problem is that we want interactivity. So this is the problem with the beep function. We schedule the beep to disappear two seconds afterwards. But what if we change our minds before the two seconds is finished? What we really want is we want just in time scheduling for these kinds of sounds. So when we do, so is set time out adequate for this. So let's just create a timer that's going to kill the beep after two seconds. But then if we change our mind, we can always kill that timer. But that's not good enough because set time out has very poor jitter characteristics in the browser. Like no musician would respect the kind of jitter that set time out shows. So set time out is not an option. Neither is set interval. Set interval is a little bit better but it's still not an option. Request animation frame is a very good candidate. The jitter characteristics for this are very good relative to the others. And it's tied to the display refresh rate. So it runs typically at 60 times a second on its own. So that's, so request animation frame in conjunction with the sample accurate timing of the web audio API. We can get great interactivity and timing accuracy. So what happens is that we want to compute audio for about a little bit more than the next request animation frame called that. So that by the time it arrives, there is at least something that's going to go out and we're not going to stall or the audio is not going to break out. So this is the typical pattern of overlapping computation of audio that we want to do in real time. So when I moved that morph slider in the rhythm engine demo, what you saw there was that these computations were sort of making instant by instant choices on what to do next. And especially when it comes to the tempo changes. So, and the Web Audio API provides something called a script processor. You can write arbitrary JavaScript code that image sound. You can actually compute float 32 buffers and send it out to the audio and it'll play back. There are some, in the current specification and implementations, there are some serious drawbacks to the script processor. That permit, that do not permit it to be used very well with the other notes. I'll show a couple of cases of that. So a library that I wrote, just that gathers all of these lessons into a handful of functions, which is Stellar. So the first problem of the disappearing nodes is solved by Stellar's graph node, which makes it easy to create stable abstract sound models. And then the scheduler is an interesting part of Stellar. So what it does is the scheduler separates the specification of an interactive audio visual rendering from its actual play time. And apart from that, it also takes care of sample accurate timing and just-in-time timing, just-in-time decisions as well. So I'll show a couple of examples, just after I walk through graph node and schedule a bit. So graph node is a very simple function of this kind. It takes any object and then turns it into something that can participate in the signal flow graph. And the scheduler is, that the player of the scheduler works on functions of this kind. So if you're familiar with functional programming, you can read this pretty much as a continuation passing style thing or callback passing style function, except that now a clock is being threaded through these callbacks, so that this asynchronous thing is happening over time. And the timing is precise in this case. So, and it also provides a whole number of higher order functions that you can draw on for composing different sounds together. So I built a small explorer for playing around with both the scheduler and the graph nodes. I'll show a little bit of that. So this is the stellar explorer. Ignore the comments, they're not so important. There's just for the person who's visiting that for the first time. What they can do here is I wanna show, I wanna beep for the first time like a child, right? So that's been one of the things. So I'm gonna do some bit of live coding here just for kicks. If I, I'm just creating a, I'm creating a built-in chime model. I'm instantiating it in the CH variable. And I'm gonna say, I'm gonna make a note of this, 72, 0.5, that's one note, and the procedure note of 79, 0.5, that's another note. But these notes are just models for playing back those notes. They're not actually being played just yet. And you can play them as many times as you want. So if I do play, so that's the first note, and that's the second note. Then what I can do is I can put them together into a track and do that. So that's, well, this thing just plays and stops. So I wanna keep it going for a while while I make a few other tweaks. So I can just loop that. So this is gonna keep playing while I'm gonna do a few other things. So when I click on the CH, I can get the half-life and a few other controls. So this is a very simple beginning, and I'll show a couple of more interesting examples of that. So this combines precise synchronized visuals with the sound. It's again, the simple random notes being played, but there are also synchronized visuals here. So I'm just gonna select everything and just hit run. Bit of a delay through the visual system so you can notice breaks and synchronization. And now I can control this. What I can do is I can also change the speed that provides you in terms of live control. So that brings me to roughly the close of this. So there are a few goodies coming. I mentioned a couple of problems that people usually run into, and these are being addressed. One of them is that we do not have a clear way to coordinate the musical events that are generated by the, that you schedule in the Web Audio API to other events that you might wanna do, like visuals or sending out MIDI or controlling lighting using MIDI or any of the other display activities that I do in it. So there is additional facility coming in for doing this. So currently we have current time that always progresses in real time, but there's work on just specifying and implementing a current play time so that we have a clear idea of when the sound that I'm computing is actually going to be heard by people. So that's the output latency. This information is gonna be available and it'll completely change the precision with which we can orchestrate things in the browser. So the second thing that's really exciting is audio workers. So I mentioned that the script processor node provides, lets you write arbitrary JavaScript and compute float 32 buffers that you can ship out to the audio system. So that, in the current implementation, those scripts run in the main thread. So if you have other activity that's going on in the main thread, it's going to seriously impact the audio that you're generating. You're likely to end up glitching the audio. As a consequence, what people do is they have very large buffers, like 2048 samples or maybe even 49 to six samples. And those, and that ends up in reducing the latency and the whole point, the WebBody API was, people began working on it in the first place was to have low latency audio. With the audio workers initiative, we're gonna have a JS code run right in the audio thread and without any breaks, without interfering with the main thread, which means you can do your UI in the main thread and have the audio be smooth. That's a super big deal because we now have arbitrary JS code that's on par with the native code that the browser's put in for all the processing, the filters and all the other native code filters. It would even be possible to completely reimplement all the native functionality in pure JavaScript with that particular, when that happens. And I'm actually looking forward to that because that leads to some really cool things that you can do, which you cannot do at the moment. So one thing that I wanna show in that is, show about that is the bug that we have and we call it a bug, but this is what's in the current implementation, as I mentioned, the script processor now runs in the main thread. And because it's running in the main thread, it's, there's got to be events being passed back and forth between the audio thread and the main thread. This means every time you pipe audio through a script processor node, it incurs some major delay. And this is code where you're not supposed to hear any delay between these from a conceptual perspective, but because of the implementation and the specification of this point in today's stage. You will hear that, because all I'm doing in this case is I'm writing an on-audio process that's copying data from any, it's input pin to the output pin doing absolutely nothing else. So if that's all what it's doing, then if I pipe the input data directly to the speaker and the JavaScript output to the speaker, I shouldn't be able to hear a difference. But now in this case, I can hear a delay and that's what I mean we will get better. So you can hear two tones. So in principle, you're supposed to hear only one tone. If there's no, if the script node does not add any delay. If you add other native nodes, like a gain node or something instead of the script node that also does the same kind of copying, you would not hear that delay. And we want the JavaScript code to also coexist, have basically the same performance characteristics as native code to get some very interesting functionality out. So that's what's coming up. And once this is up, I'm super excited about that. So in conjunction, as I mentioned initially, the Web Audio API just deals with the audio part of it and leaving and just leveraging everything else that's getting into the browser and being put into the browser by other teams. So the creative possibilities are just exploding at this particular point. And especially when you throw in networking, people are doing network jam sessions using this. Google has done a few demos on that file. As it stands, quite a few of the old Web Audio API demos are broken because the spec is also evolving and the browser implementations have also evolved but the code has not kept pace. So you may find, if you go around exploring, you may find that the demos are a bit broken. But never mind, they'll catch up once version one is finalized and people are happy with audio workers. So thank you. I'm at the end. Any questions? Hey, great talk by the way. One question. If you heard about Chris Wilson, he's already working on Web Audio and all those things. His project regarding Web Audio Playground. Do you think in future that your libraries can be fused together to create something fun? Because right now he has run core JavaScript coding if you see the code. But on top of it, if we introduce your libraries over there like GraphNode and all those things, do you see any benefit over there? I wrote those mostly for my own purpose because I found reusing the GraphNode and the scheduler to be much easier to do. I don't have to keep solving the same problem every time I make a project of that kind. So, of course re-use is always welcome on that front. I mean I've had a couple of requests. People have tried to use Stellar for actually writing an audio editor and also orchestrating full sequences of signal processing flows. And so there is some interest in that regard, but quite often what I find is that the code just gets, the code for doing the timing especially, that keeps getting reinvented. Maybe once the current play time and other things stabilize, then it will become less important. It may become less important. We'll see when that happens. Thank you. What's wrong? Take the notes. It's not so much libraries. It's more like, that's more music domain stuff. In fact, part of my research work, I showed you some supercalator code that did venous synthesis, but I mean that was part of my research as well. We do not have an adequate formal understanding of a Raga system, even though we have a lot of musical, logical literature on it. So, and synthesis is a great way to explore the rules behind this system. So just go ahead and keep doing it, like play around with note generation, play around with Gamaka generation. Just keep doing it until you figure out what that's where. We don't have major libraries today to do that. You know, there is some amount of work, some, for example, there's this system called Gaayaka that you can take a look at. It's not based on web audio, but it provides some amount of intelligent Gamaka feeling of phrases in Raga's. My own work was based on, my PhD thesis was based on Gaayaka's, was building on top of Gaayaka's work. And there's still a lot more to be done. We've probably like five years before we have major very interesting results to show on that front. There's already some amount of interesting results, but we need more understanding to comment on that. Yeah, hi. Great talk, by the way. I had two questions. First was regarding debugging. Obviously, when you're making all that kind of stuff, you need to test, you need to debug. What's your experience with that? And with the developer tools, if you use them, is there any kind of feedback that you would want browser makers to do? And the second part is regarding, this was about the Web Audio API, but I'm curious about your thoughts on the Get User Media API. And the Get User Media API, in which people can actually take input from the mic, and have you played around with that and combine it with the Web Audio API? Yeah, I'll take the first one. Debugging, in the case of music, is pretty, to some extent, easy because you hear the wrong result. You hear what you don't expect. So it's a bit easier on that front. Apart from that, the regular development tools surface. Regarding the WebRTC, the Get User Media, I mean, the microphone input is already integrated into the Web Audio API. In fact, Stellar already has a module called mic that just lets you grab the microphone input and type it through processing routines. The other integration with some of the other media stream APIs, like loading an audio or video thing off a URL and then piping that through the Web Audio APIs processing system, that's being worked on. Good parts of that are implemented in Firefox, but not in Chrome yet. So it's in various stages of implementation and proposal at this stage. Thank you, Shree Kumar. So feedback forms have been distributed to you guys.