 I suppose like many of the things I do, this talk and the work that I'm going to present is in some ways born out of frustration and my attempt to redirect that into something productive. So if you've ever gotten frustrated trying to understand how your code works or why it doesn't work or trying to understand how someone else's code works or trying to understand how my code works, sorry about that, well you're not alone and this talk is for you. So the release of D3.4.0 last year was focused on making D3 easier to learn, you know, more consistent, more modular, but despite the large number of cosmetic changes to the API, it wasn't really a radical departure from earlier versions. The core concepts of selections, scales and shapes were polished, but they were, you know, mostly unchanged. They were basically doing the same sort of thing. And I think this was the right balance for continuity. I mean I know that the API changes themselves are disruptive, but you know, I don't want to change everything completely every year. I want to make a balance between sort of doing some innovation and some improvements and keeping things the same. But after the release of 4.0, I wanted to think a little bit more deeply about not just how to make D3 easier, but how to make visualization easier. Yet, in seeking to better a tool for visualization, I remembered something. I remembered that visualization is itself a tool, a means to an end, a means to insight, right, a way to think, to understand, to study, to communicate something about the world. And per Ben Schneiderman, the purpose of visualization is insight, not pictures. And so if you consider only the task of assigning visual encodings, of constructing visualizations, you ignore myriad other challenges, right, like finding relevant data, cleaning it, transforming it into efficient structures for analysis, designing that analysis, statistics, modeling, simulation, explaining your findings. And I don't mean to or I don't wish to downplay the importance of visualization tools and innovation therein. I have many improvements I still plan on making to D3, and I'm really excited to see other new approaches like Vega-Lite come out. But I also think it's important to step back occasionally and consider complementary approaches to related problems. Tasks supporting discovery are often performed by writing code. And coding is famously difficult, right? Even its name suggests impenetrability. Code originally referred to machine code, low level binary instructions to be executed by a processor. Code has come a long way since then, but it's still hardly human friendly. To give a sort of comically dense example, here is a bash command that I wrote for generating a coroplet of population density from California census tracts. So, I mean, it looks like it starts with GO2 Topo here, but that's not actually what it does. It starts with this, it doesn't start with any JSON join either. It starts with this shape to JSON, which is reading a shape file, converting it to a new line delimited GeoJSON stream. And anyway, it's not actually just bash here. These are also JavaScript expressions, which are embedded within bash. And then, anyway, I can spend probably a whole talk just going over how this particular slide works. Now, Brett Victor gives this very concise definition of programming. Programming is blindly manipulating symbols. And by blindly, he means that we can't see the results of our manipulation. We can edit the program, rerun it, diff the output. But programs are complex and dynamic. And so this is neither a direct nor an immediate observation of the impact of our change. And by symbols, he means that we don't manipulate the output of our program directly. We operate in abstractions. And these abstractions can be powerful, but they can also be difficult to grasp. In Donald Norman's terms, these are the Gulf of Evaluation and the Gulf of Execution. But clearly, some code is easier to understand than others. And so what are the symptoms of inhuman code? The first thing I think of is spaghetti, right? Code that lacks structure or modularity. Where in order to understand one part of a program, you have to understand the entire program. And this is frequently caused by shared mutable state, right? If you have a piece of state that is modified by multiple parts of a program, it becomes much harder to reason about its value. And indeed, how do we even know what a program does? If we can't track its complete state in our heads, then reading the code is insufficient. And so we use console.log, we use debuggers, we use tests. But as I'm sure you've all experienced, these tools are also limited, right? A debugger can only show you a few values at a single moment in time. And so your ability to see sort of rich or complex data structures and patterns of execution are limited. And so we continue to have great difficulty understanding what our code does. And sometimes it can feel like a miracle that anything works at all. And despite these challenges, we continue to write code, right? We're still writing code all the time for lots of different applications more than ever before. And so why is that, right? Are we masochists? I mean, maybe. Are we unable to change? I mean, yeah, probably. But I mean, is there no better solution? And in general, and that is a very important qualifier, no. Code is often the best tool that we have because it is the most general tool that we have. I don't mean best in some sort of absolute sense, but I do mean best for the right here and for the right now and the person that's doing the work. And that is because code is the most general. It has the most unlimited expressiveness. And alternatives to code, whether that's sort of higher level or that also includes higher level programming interfaces and languages, can do well in specific domains. But these alternatives must sacrifice generality for greater efficiency in their domain. And if we can't constrain the domain, it's unlikely that you'll find a viable replacement for code, right? There is no blanket replacement as long as humans are still thinking and communicating primarily in language. And it's hard to constrain the domain of science, right? Science is fundamental. We're studying the world. We're trying to extract meaning from empirical observations to simulate systems. And a medium to support discovery must be capable of expressing novel thought, right? Just as we don't use phrasal templates and madlibs for composing the written word, we can't be limited to chart templates for visualization or a drop down menu of formulas for statistical analysis. We need more than configuration, right? We need to compose primitives into creations of our own design. And if our goal is to help people gain insight from observation, we must consider the general problem of how people code. Brett Victor had this to say about math, but it applies equally to code. The power to understand and predict the quantities of the world should not be restricted to those with a freakish knack for manipulating abstract symbols. OK, so when I talk about it being hard to code, it's not just a question about making our workflow more convenient or more efficient. It's about empowering people to understand the world. Now, if we can't eliminate coding, can we at least make it easier for our sausage fingers and finite sized brains? And to explore this question, I've been building prototyping an integrated discovery environment called D3 Express. It's for exploratory data analysis, for understanding systems and algorithms, for teaching and sharing techniques and code, and for sharing interactive visual explanations. I do want to make visualization easier, but to do that, we need to make coding easier. I cannot pretend to make coding easy. The ideas we wish to express, explore, and explain may be irreducibly complex, but by reducing the cognitive burden of coding, we can make the analysis of quantitative phenomena more accessible to a wider audience. The first principle of D3 Express is reactivity. Rather than issuing commands to modify shared state, each piece of state in a reactive program defines how it is calculated, and the runtime manages their evaluation. The runtime propagates derived state. If you've written spreadsheet formulas, you've done reactive programming. This is a simple notebook in D3 Express just to illustrate reactive programming. It looks a bit like the browser's developer console, except here our work is saved automatically so that we can revisit it. And it's reactive. So in imperative programming, c equals a plus b copies the current value of a plus b into c. It's a value assignment. And that means if a or b changes, that c retains its original value until you execute a new value assignment. But in reactive programming, c equals a plus b is a variable definition. And that means that c is always equal to a plus b, even if a or b changes. So if I'm defining a and b and I update their definitions, the runtime is automatically keeping c up to date with all of the active variable definitions. And so reactivity means that as the program author, we care only about the current state. And it's the runtime's responsibility to manage changes in state. And that may seem like a small thing when you're just adding a couple numbers. But as your programs scale up, this is eliminating a substantial burden. Now obviously, a discovery environment needs to do more than to add a few numbers. So let's try working with data. So I'm going to load d3. And then I'm going to use d3.csv to load this csv file here. Now both of these operations here, requiring the library and downloading the file from GitHub, are asynchronous. But in a reactive program, we hardly notice this. And that's because the definitions that depend on these asynchronous values aren't executed. They aren't evaluated until their inputs are resolved. And so reactivity means that you can write most asynchronous code as if it were synchronous. Now you can also see the output here from the result of downloading this file. And d3.csv is conservative about types. It doesn't infer any data types. And so all the values, dates, and the close, this is like a few years Apple stock price, are strings. But to start working with this data and to do analysis, we need to convert those into more precise types. So here I'm defining an accessor function or a row function to d3.csv that I can map those types to more specific types, map strings to more specific types, or even change the format of the data if I wanted to. So the close field is a number. So as I change that, as I put the little plus symbol there, that's the unary plus operator, you saw that the close purple strings changed into green numbers. So it's immediately giving you feedback of the changes that you're making. And likewise, if I want to convert this date into a more precise type into a JavaScript date, I need to parse that, but JavaScript doesn't understand that date format natively, and so I have to write a function. But in this case, I've actually called the function before I've defined it. Like in a reactive program, I can write my code in any order, and as I finish writing the program, it'll bring everything up back to date. So I call the parseTime function, and now I'm defining that parseTime function using d3 time format, passing in that value here. Again, I can see it updating, and as I substitute the fields with the appropriate whatever percent commands, you can see that it updates, and it looks correct. Now that the data is in the right format, I can start to ask questions of it, like if I just want to compute sort of simply the range of dates in this data set. But again, I made a mistake here. I forgot to give the data a name, so that's going to give me an error. But I'll just go in there and assign it a name, and then it reevaluates the earlier command, right? So it becomes much more resilient to error when it's automatically reevaluating things as they're currently defined, rather than sort of constantly thinking about sort of what state is your program in, and how do I get it into the right state. You're always operating under current definitions. OK, now unlike the developer console, cells in D3 Express can have visual outputs, simply by returning DOM elements. So we can take this data, and we can turn it into a chart. So we'll specify what the size of the chart is, the width, height, and margin, sort of the standard D3 convention, and then we can go back and we can take the domains, the extents of our data that we've computed, and use those to construct scales. So we'll have a time scale for x, mapping that domain of data to sort of x position. And similarly, for y, we can have a linear scale, taking the domain of the closed dimension, and mapping that to sort of a vertical position. So those have changed to be scales now, and now I'm going to open up, and I'm going to create an SVG element. So this one, unlike the other ones, has curly braces on it. Sorry. Skip ahead there. So when I open up the SVG, I'm going to use curly braces so that I have sort of the ability to write an arbitrary block of code there. I'm not limited to just writing a very shorthand expression. And so inside of the SVG definition, I'm going to use DOM.SVG in order to create an SVG element. That's just a convenience wrapper on top of document.createElementNS. The idea is that your notebooks, they're basically just working with detached DOM nodes. And then by returning those nodes, they end up getting displayed in the browser. So this starts out as just an empty SVG node. And then I'm going to start to add some structure to that using d3 selection. Like I can add an axis here. And by default, that's going to be at the top because it's rooted at the origin in the top left corner. Then I can specify my transform attribute to move that down. Take that code, copy it, and make the y-axis, which goes there on the left. And so as I'm making these changes, I can sort of immediately see what the effect is on the output. I'm not having to sort of constantly switch between my editor and then reloading it in the browser. Likewise, I want to add the actual path there to draw the line. I can add a path element. I'm going to need a function in order to compute the geometry for that path so I can use d3 line. And then pass in the right x and y accessors using our scales and pulling out the appropriate fields. That looks data driven, but it's not quite right because path elements, of course, filled black by default in SVG. So we'll change those attributes to remove the fill and replace it with a blue stroke. Now that's a very basic line chart. But already you can see that the program's topology is starting to become more complex. So this is the directed acyclic graph of references in that line chart. And this graph was itself made by d3 express using graph is. And the 93 at the bottom is the sort of unnamed cell, which is the SVG output. So a few observations of this graph. It's now trivial to take our chart definition and make it responsive. The width, the height, and the margin feed into the scales, they feed into the SVG definition. They're currently defined as constants. But if we wanted to make this chart responsive to the window size, we could just replace those definitions appropriately and everything else would update. Likewise, if we want to replace the data, if we want to have a real-time data stream coming in there, we're just replacing the data definition. And now our static chart becomes a dynamic chart. And I'll show that shortly in action. But before then, let's look more closely at the difference between the imperative style of coding and the reactive style. So this is your typical d3 code that you might see on blocks.org and some of my examples, where on page load, you're defining a scale. But on page load, you don't have the data available, so you can't initialize the domain of your x scale. And so then later, after the data loads, you're defining the scale or the domain of your x scale. And so if you think about it, you're separating the definition of this object into two places in your code. And you can have sort of arbitrary amounts of unrelated code separating those definitions. If you compare that to the reactive definition, the reactive definition is centralized, because we no longer care about the order of execution and the dependencies in our code. Those are now managed by the runtime. And so we can centralize our definition. And so reactive programming is not just about making things more convenient or saving you time. It's also about getting a cleaner code structure. And this is particularly useful if you want to be able to reuse these definitions in another program, because your definitions are now localized and they're easier to copy and paste or to import into other documents. So lastly, online charts, despite the name D3 Express, you're not required to use D3. It's just doing DOM. It's doing JavaScript. And you can use whatever library or whatever format your browser supports to create your visualizations. So this is a similar chart, but now I'm using Vega-like to show that same data with the nice sort of concise syntax that that provides, rather than operating sort of on the low level graphical objects. So another example of that is if I wanted to use Canvas instead of SVG to do something else, like in this case, we want to make a globe. I've loaded top adjacent. I've loaded some topology of world country boundaries. And now I'm going to open up a new block and create a canvas and get the context for that canvas. And then again, if I return that canvas, that canvas is going to be displayed. But I can now start drawing to that canvas to get it to display. And so this is the sort of standard way you might use D3 Geo to draw to the canvas, where I have a Geopath object that takes a projection. And then I pass it in the context of that projection. And here the projection is defined just as like your fixed Geo-orthographic projection. And I can add like another outline to that and stuff too, which you'll see in a second. But I want to use this to sort of showcase another powerful feature of reactive programming, which is I can take a variable definition that's static, like this sort of fixed aspect orthographic projection. And then I can replace it with a dynamic definition, like a rotating projection. And I can do that only using sort of standard JavaScript, which is that I take the static definition of this projection as just a Geo-orthographic, and I can replace it with the generator. So here you're seeing like a Mercator and Equirectangular, and it's updating automatically. But if I put the star in front of the curly brace here, I'm now defining a generator that can yield multiple values rather than just a single value that's a constant. If you haven't used generators before, I mean this is a standard JavaScript thing. They're relatively new, but they are really cool. And it's a lot of fun to sort of just take a little static definition and replace it with something dynamic like this. And so the way that that's working is that the runtime is pulling new values from the generator 60 times a second. So the generator is kind of, it runs once, and then it basically gets suspended until the runtime is pulling a new value from it. And the way that the code works is that when it's pulling a new value, it's just setting the new rotation angles on that projection and then returning it. And again, because the runtime understands sort of the relationships between all of these different variables, it knows to recompute the canvas whenever that projection changes. And so it's really easy to take sort of a static thing and then add a scripted animation on top of that like we have here for this rotating globe. But one of the things that's going on here is that, or you may not have noticed, is that it's actually throwing away the canvas and creating a new canvas each time that it's rendering. And that can be pretty expensive to just throw everything away and restart it. It's nice that it kind of works by default. But if you want to go in and you want to optimize it to make it a little bit faster, D3 Express, you can refer to the previous value of a variable as this. And so in this case, I've changed the canvas definition to just use the existing canvas that we had previously rather than to create a new one. And you can see that once I do that, it starts to smear because it's just drawing it every time rather than having a blank canvas that it's drawing onto. But of course, I can then change the code. I can clear the old value. I can fix the context line width. And the result is you can have these very efficient animations. There's negligible overhead because you're doing basically the same thing that you'd be doing in vanilla JavaScript. It's just that you didn't have to worry about managing all of that state yourself. So again, to just look a little bit more closely at the code, this is a static definition for projection. It's just a geographic instance. And then this is our rotating projection. And so it's a block statement with the star in front so that we can yield values. And we're just going into a while true loop that sets the projection.rotate and then yield that value. And that's it. We basically doesn't have to change anything else. So if generators are good for scripted animations, what about interaction? Generators can do this too. And the way that it works is that you yield a promise that resolves whenever there's new input from the user. So I'll illustrate that now by going back into this sort of rotating projection. And if we want to make this into an interactive rotation, the first thing we're going to need is sort of a little widget slider for the user to drag. So we can again do that by just creating the right DOM element here. So DOM.range is just, again, like document create element. Creates an input element that says the type is range. Min value is 180. Max value is 180. You could write this all by hand, but it's nice to have sort of concise syntax. Now that's not hooked up to anything. So dragging it doesn't really do anything, but we can give that a name, and then we can write a generator that's going to yield the new values whenever you're dragging with it. Now, again, I could write that by hand, but that's a very common thing to do. So there's this built-in generator called generators.input, and it's just going to do that for you. It listens for input events on the DOM element, and then it yields the new value. So now when I'm dragging it, I can see what the value is as another cell in the notebook. Likewise, I give that a name, take that angle, plug that into the rotation of the projection, and then now when I drag the slider, goes back and forth, you can see that the globe is rotating. Now, again, this is such a common thing to do that there's even a shorter syntax for that where you can define both sort of the graphical interface, like the DOM element that's being displayed, and the value that gets exposed to code. So that's what the view of operator is. And it's two definitions in one, and they work exactly like the definitions that you just saw. It's just a slightly more concise definition of that. So again, this is our sort of long form definition for an interactive projection, where we've got the projection, we've got the angle, which just comes from the DOM element, which is the input range, and then this is the shorthand syntax for exactly the same thing, where we use view of angle. Okay, but the cool thing about this is that we now have the ability to sort of create arbitrary graphical interfaces and design the appropriate sort of programming interface, like the values that get exported along with that. You're not limited to just sort of sliders and drop-down menus and some fixed palette of user interface. So in this example, I'm actually gonna create sort of like this complex compound input to make color picker. So this is like a form here with the table inside of it, and then there's gonna be an input for each of the channels in our color space, so hue, saturation, and lightness. And this is just using, so DOM.html, it just takes a big string and it sets like the inner HTML of a div and returns that so that I don't have to like create all of this stuff in JavaScript. I'm just kind of embedding an HTML fragment in my code. And then likewise like here is where I'm sort of defining how this is going to be exposed to my code. I'm setting what the value is on the top level element and so that's going to be updated whenever you drag the slider, whenever there's an input event. And it's defined here to create a d3qplex instance, and it also is sort of updating the outputs that go along with each of these inputs so that you can see what the values are in this table below. So when I drag the hue slider, it's both updating that hue angle that's next to it, and you can see that it's emitting a new color which is displayed in the cell below. And then that is used sort of to set the background color of this div so that we can see what that color actually looks like. Okay, now that's sort of a toy example. Again, I'm sort of defining a custom interface, but sort of where this starts intersecting with visualization starts to make it much more interesting. So this is a histogram showing the return of a few hundred stocks over a five-year period. You can see that it's like a bell curve, right? That's the mode of that is slightly greater than one sort of expecting a positive annual return. But there's also like this long tail of stocks that did really badly and a long tail of stocks that did really well. Now in another environment, it might be difficult to sort of inspect this directly. The visualization can kind of be a dead end where you can see it, but if you wanted to sort of know what are the data points behind these individual bars, you'd have to phrase that as a new question in code. And the goal of D3 Express is that you can sort of quickly augment these visualizations so that we can start to manipulate them directly, right, getting back to what Brett Victor said. And so the way that we do that is by basically taking this visualization and treating it like we do an input, like we do with the slider, like we do with the table. And so now when I'm brushing back and forth, just using the standard D3 brush, it's actually exposing its current selection as the array of data that are there. And I'm just displaying that sort of using the default inspector, but it's good enough to let me know sort of what is under the selected range, and I can just drag that back and forth without sort of doing any work. And just to show that there's not really any sort of real magic going on behind the scenes here, this is the code that adapts sort of your standard D3 brush. This is not like a special version of D3 brush. This is just the same D3 brush that you're using today and fitting it into this new framework where you're receiving your brush event, you're looking at the brush selection, and then you're pulling out the X and Y values that basically like the start and the stop, and then you're filtering your data based on those values, and then you're setting the property, this is what the value property of your node, that's what gets exposed to the code, and then you're just telling the generator, you're telling D3 express that the value is updated by dispatching an input event. Now, another thing that you can do in D3 express that's useful is that by default, these reactions to your code are going to be applied instantaneously. So whenever a variable changes, the runtime knows what depends on that, and it's going to recompute all the derived variables and update the display instantaneously. Sometimes though, that's not what you want, right? You have something that changes and you wanna be able to observe what's changed. And so we wanna use these animated transitions to have some object constancy. So in this like again, simple bar chart, I've written it in such a way, it's using the D3 data join, and there's a transition that's staggered so that when this data updates, it's going to move the bars into their new position so that you can see how the values change. And then likewise, like this data set here, which is just sort of the frequency of English letters, is defined so that if you change the sort by value flag, it's gonna be sorted either by descending frequency or sorted just lexicographically. And so now when I go back up to the top and I change the value of this by using the checkbox, the code can sort of apply a transition from the old values to the new values. So you can use sort of access to the previous value both to improve performance and to get better visual output because in the reactive system, you can kind of opt in to controlling how these changes get applied. So again, it's sort of like opt in complexity as you wanna add richness to your implementation. Okay, so that was a pretty whirlwind tour of reactive programming in D3 Express. But you also saw how you can use the inline outputs, the visual outputs of cells to improve our ability to see the program's current state. But I wanna dive into this a little bit more and show you how to use visualization in D3 Express to improve our ability to scrutinize a program's behavior. So reactive programming, where you can sort of change the code and immediately see how it updates, is also known as interactive programming. And interactive programming allows us to investigate how code works by poking it. You can change it, you can delete some code, you can reorder it, and you can sort of see what happens. And it lets you get a sense of how that individual bit of code that you changed is contributing. You're again doing a more direct observation of how that code impacts the program. So in this notebook, I've got sort of the standard force-directed graph of Les Mis data, and this is your sort of force simulation. So I add or remove the charge force, and I can see how that's using it. The charge force is causing the nodes to repel, right? Otherwise, if you remove it, they sort of collapse down into the center, where the only force that's really applying to them is the link force. Likewise, I can modify parameters of the charge force so I can set the strength to be 100, and of course, that is now they're attracting each other rather than repelling each other, so they all kind of collapse down on the center, change it to negative 100, and they kind of expand out 50, 100, whatever. And what you're seeing here also is that it's not sort of reloading the entire page when you're making these changes, right? It's a reactive sort of topology. And so when I'm changing the definition of these forces, it's not sort of throwing everything away and starting over. It's just operating on the current definition that's running. It's you're doing live editing of the program. And so that sort of improves the stability and lets you see more easily how these changes are contributing to the program's behavior. So likewise, like if I take out the link force here, they're sort of no longer connected to each other and they start spreading out, or if I take out the centering force, then it can sort of start floating away. Bye. Okay. Now, a more explicit approach to studying program behavior, so rather than sort of tinkering with it, is to try to expose its internal state. And I'm gonna illustrate this by just using a very simple example of computing a running sum. So we can take sort of a normal JavaScript function which returns a value and we can turn it into a generator and the generator now yields values in addition to its normal return value. And the idea is that those values that we're yielding as the program is running give us a view into what that code is doing when it's running. And the nice thing about having sort of both yield and return is that you can essentially take sort of arbitrarily complex functions, at least if they're not already generators, even recursive ones, and you basically have this like extra channel now where you can expose the internal state of your program. And that's really useful for visualizing a program's behavior or for studying a program's behavior because it allows you to sort of cleanly separate your visualization of the behavior or your analysis of the behavior from the code itself. Like if I take some code as I've done before and put my visualization code directly within that algorithm, it starts to become a mess. Like if you're doing canvas draw stuff and then you're using a debugger and it's like switching between the algorithm and the canvas stuff, it just gets complete chaos. But if you have this approach, then you can just sort of extract the data out of your program either while it's running, using generators, or sort of statically building it up as an array of values. Then it becomes much easier for you to do that analysis. And so in D3 Express, here's sort of how you extract the data out of that generator. So the simplest way of doing it is you just call your function like you would do before. And because it is now returning a generator, you automatically get an animation, right? So D3 Express knows that when your variable is defined as a generator, it's gonna pull a new value out every animation frame. But you could also do things like this, which is a spread operator, where you're essentially pulling all the values out of that generator at one go and putting that into an array. And that's useful if you wanna do sort of more static visualization or if you wanted to do interactive visualization where you're sort of scanning in between individual frames. Now, that's obviously like, you don't really need to study a running sum function. I think we all know how that one works. But I wanna use sort of a more real world example. And this may get a little bit hairy, but I'm gonna try it anyway. So looking at the circle packing layout in D3. So you've probably seen this before. This is the flare class hierarchy. And hierarchical circle packing is basically like tree maps, but you have this sort of extra wasted space because you're nesting circles rather than squares. And that extra space is not really wasted. And it sort of helps to indicate sort of the hierarchical structure in a way that's not always obvious with tree maps. Now, in order to produce these sort of diagrams, you first need to sort of lay out the individual circles, right? The set of siblings and part of your tree. And so this is sort of a little example of how that works, right? You have a set of circles that you wanna pack in order one at a time into as small a space as possible without overlap. Sort of a little bit like penguins huddling in Antarctica. And so your job is to place one of these circles at a time until you've placed all of the circles. And since you want the circles to be packed as tightly as possible, you know that like each new circle that you place should be tangent to at least one of these circles you've already placed, actually two of those circles. But if you just pick sort of an existing circle at random as you're placing the new circle as your tangent circle, you're gonna waste a lot of time trying to put that new circle sort of in the middle of the pack where it's gonna overlap with the other circles that you've already put down. So ideally as you're determining what your tangent circles are, you should only be considering the circles that are on the outside of the pack. But the problem is like how do you efficiently determine which circles are on the outside? And so Wang's algorithm, which is what's used by D3 and by other implementations of this layout, maintains a front chain, and that's what the red line here is. And the front chain represents sort of these outermost circles. And so when you're placing a new circle, it picks the circle on the front chain that is closest to the origin. And the new circle is placed tangent to this circle and it's adjacent neighbor. And if there's no overlap with the other circles that are on the front chain, then it can just move on to the next circle. But if there is overlap, like as you see in this case here, where the big circle overlaps with the other circles that are on the front chain, then it needs to cut the front chain so that it can now choose a different pair of tangent circles and effectively move that overlapping circle out to the outside. And so if you look closely at this animation, you can actually see those moments where it's cutting the front chain as these sort of larger circles get kind of squeezed out of the pack and pushed down, right? And so I find this kind of mesmerizing to look at, but more than sort of being eye candy, this animation and this notebook was extremely helpful for me for fixing a long standing bug in D3's implementation, where there was a little bit of vague wording in the original paper and it wasn't obvious which side in some situations of the front chain to cut where you've got this circular structure. And in some cases, it's not obvious which part of the front chain needs to be cut in order to place the new circle. And having the ability to sort of inspect the program to see what it's doing as it goes along rather than just sort of seeing the output at the end and it being wrong made it much easier to sort of isolate the conditions that led to the bug to make a change to the algorithm and see how it affected those specific conditions without sort of starting over with new random output. Now this is only sort of one part of circle packing. The other thing is that once you've sort of laid out your siblings, you need to compute the enclosing circle for that pack so that you can then move on to other parts of the hierarchy. The conventional way of doing this is that you just sort of scan the front chain and you pick the circle that is the farthest from the origin. And that works pretty well because the packs tend to be roughly circular, but sometimes the packs can be like sort of slightly shifted off to the side. And so that doesn't end up being an exact solution. And I learned that there's this algorithm called Wellsell's algorithm that gives you the optimal solution and it runs in linear time. And so there's really kind of no reason to not do that. I mean, it's a little bit more work to implement it, but if I can do it once and improve these things, even if only slight improvement, that's good. And also, you know, it's just fun to kind of understand how these things work. So Wellsell's algorithm sort of is again an incremental algorithm. It's working on sort of one circle at a time and it works in random order. And you can see just in this animation that it sort of very quickly because of that approach converges onto sort of roughly the right enclosing circle, but there is a chance as you get sort of these circles that are on the outside that it has to expand. So how does this algorithm work? Well, actually one thing I should say is that this is sort of a slightly harder problem that I'm showing than what happens inside of the circle packing layout because in circle packing it already has the front chain. So it really only needs to compute the enclosing circle of the front chain, but here I'm just sort of showing the general case where you have an arbitrary set of circles and you don't know the front chain ahead of time. Okay, so how does this algorithm work? Well, let's assume that we already have an enclosing circle for some set of circles, for circles zero through i minus one. And it sounds sort of like circular to start assuming that we already know what the enclosing circle is, but this is kind of like how math works, how induction works, how we can start to build an algorithm as a starting point. So if we assume that we already have an enclosing circle for some set of circles, and all we want to do is sort of incorporate the next circle into our enclosing circle, well, if that new circle, which is the black one here, is already inside our enclosing circle, then we don't need to do anything, right? Our circle is fine and we can just move on to the next one. But if the circle that we're trying to add is outside, it's not contained by the enclosing circle, then we need to compute what the new enclosing circle is. But we can actually make an observation about this new circle, and that is that if this circle is outside the enclosing circle, we know that it's the only circle that is outside the current enclosing circle. And that means that the new enclosing circle must be tangent to the new circle that we're placing, right? So that looks like this. But the problem is we don't know what those other two tangent circles are. They might not be the same as our previous enclosing circle. But the result is if we know what one of the tangent circles is to the new enclosing circle, we can actually apply this algorithm recursively. Where each time that we have a circle that's outside of our current enclosing circle, we recurse to find the next tangent circles. And we also know some other boundary conditions like what the enclosing circle is when you just have one circle or two circles or three circles. It's called Apollonius's problem and its geometry. I'm not gonna show the geometry proofs. This is already enough. But you can get a sense of how this algorithm works and why it's able to terminate. Okay, so now that we understand this recursive structure, we can produce a visualization that shows this more complete view of how the algorithm works. And so from left to right here, you're seeing basically the four possible depths of the stack. Like it can't recurse more than three times because you can't have more than three tangent circles or else you already contain all the circles. Again, that's geometry. So before you just saw the circles on the left, but now you're seeing as it has one of these circles that's outside of the red circle, it's gonna add the new tangent circle and it's gonna start descending into the recursive approach. But in addition to showing you how this algorithm works, one of the nice things is that you can get a better sense of how much time the algorithm spends in different states of the program. And like in this case, you see that the enclosing circle gets bigger very quickly, but you can see that whenever it finds a circle that's outside and it needs to recurse, it actually needs to revisit all of the previous circles just to make sure that that new enclosing circle actually contains everything. But I'm not gonna prove that it's linear time, that's too much work. But anyway, you get kind of a sense here. Anyway. All right, so one way to write less code is to reuse it. And the 440,000 or so libraries that are published to NPM sort of attest to the popularity of this approach. But libraries are an example of active reusability, right? You must design a library to be reusable. And this is often a substantial burden, right? It's hard to design effective general abstractions. Just ask anybody that is an open source maintainer. And contrast like implementing one off code like you see with the D3 examples tends to be much easier because you're sort of only worried about the task at hand. You don't have to generalize into some abstract class of tasks. But with D3 Express, I'm trying to explore whether there's sort of an intermediate solution with better passive reusability where you can use the structure of these reactive documents to more easily repurpose code. So one part of that is that you can treat any document as basically a lightweight library. So in this case, like I had some other document, it defined a color interpolation function in a ramp thing which just generates a pretty gradient. And I'm now gonna import that into this document and then I'm gonna call it. And so if I had some other color function or some other utility that I wrote in one document, I don't have to create a package on NPM and publish that or GitHub repo or whatever. I can just pull that into my code. And the cool thing here is there's also pulling in the dependencies of that definition automatically. So the original definition of that used D3 HSV which is like an optional D3 plugin. And that's gonna be loaded automatically as I pull in the interpolate terrain. I don't have to sort of load that separately. But likewise, even though I'm loading that definition of D3 in the remote document, it's not gonna conflict with my own local definition of D3 because I'm pulling in the functionality, the definition, but I'm not pulling in the symbols. I'm only pulling in the symbols that I explicitly reference in my import statement. But you can do sort of even cooler things in D3 express which is that you can rewire these definitions to sort of inject your local definitions into the remote definition. Excuse me. So this is a case where I have a data set that is gonna be streaming over a web socket. I'm not gonna explain like how this code works. I think the idea is that you'd have like sort of an API for connecting to a socket and getting some real-time data stream. It just has like an array of values which it keeps like the last 60 seconds of. But the result is this is a generator. It's just gonna emit a new array of objects with time and value. And so the question is like, can I visualize this just using an existing line chart? We already had a line chart that was showing Apple's stock price which is basically the same structure, just an array of objects with date and close. Here I'm actually pulling in a slightly different definition which says time and value. But so here I'm embedding the chart. You can see it's just that same sort of basic chart. But if I add this with clause here, I'm injecting my definition of data into the chart. So that static chart becomes a real-time chart. And I didn't have to change any other aspect of the code because that code was already defined to be reactive, right? Where it's setting the domains of X and Y. But the cool thing is like I don't have to stop there. I can also augment sort of other aspects of that chart definition because they're all part of this topology which is exposed and then I can override it. So if I don't like the fact that the Y scale is sort of dynamically adjusting based on the window and I just wanna have a fixed window because I know sort of what the expected values are for my streaming dataset, I can do that. And I can pull in other definitions like the width and height and the margin in order to do that. And then I'm gonna inject my Y definition here. And so that changes so that it's now like fixed scale. And likewise, I can do the same thing with the X scale so that rather than sort of like showing you like the chunks as it's updating, I think it updates like four or five times a second here, I can just have like a very smoothly sliding X scale that updates 60 times a second and just sort of like crops the little bit of update of data so that it's slightly outside the time window. And so that's replacing the X scale with the generator. Again, like it doesn't care whether the X scale is like a constant or a generator, I can just plug those things in. All right, so notebooks in D3 Express, they run in the browser, right? Not in the desktop and not in the cloud. There's a server to save your edits, but all the rendering and the computation happens locally in the client. So the question is like, what does it mean to have a web-first discovery environment? And in my view, a web-first discovery environment, it embraces web standards. That means vanilla JavaScript and the DOM, it works with today's open source, whether that's like snippets that you find on the web or whether that's libraries you're getting from NPM and it tries to minimize the amount of specialized knowledge that you need to be productive in this new environment. There is some new syntax in D3 Express for reactivity, but I've tried to keep that as small and as familiar as possible, right? So it's using generators to define sort of these dynamic values. These are sort of all of the forms of variable definitions and they're just, you know, expression, block statement, sort of a generator block statement and your standard sort of function definition. Probably more important though, is that your code can now run everywhere, right? So if it can run in your browser, if it's using web standards, it can run in anybody else's browser, right? There's nothing to install. And that means it becomes much easier for others to repeat and validate your analysis, right? And by extension, your code for exploration can gracefully transition into code for explanation. You don't have to start over from scratch if you want to communicate your insights. I mean, it's great and I want to commend journalists and scientists for increasingly being open and sharing their data and sharing their code. But I also think that sort of, you know, putting code up on GitHub isn't necessarily enough to make that reusable, right? It's potentially a lot of work for people that want to run your code to recreate the necessary environment. I mean, maybe they need the right operating system, they need the right software to be installed, they need the right packages. Certainly, they need familiarity with the tools that you're using and that sort of stuff. But if the code is running in the browser, again, there's nothing to install and it just sort of works by default. So again, I mean, maybe you should have gotten Brett to give this talk instead, but I'm gonna end on another Brett Victor quote from the Explorable Explanations, like to explain sort of the implications of this approach. So an active reader asks questions, considers alternatives, questions, assumptions, and even questions the trustworthiness of the author. An active reader tries to generalize specific examples and devise specific examples for generalities. An active reader doesn't passively sponge up information but uses the author's argument as a springboard for critical thought and deep understanding. And so, you know, imagine if our algorithms are communicated not just in pros and PDFs, but they're shared in live code and they're shared with interactive visual diagrams. It becomes so much easier for the reader to sort of look at how they work and to question it and to tinker with it and to make modifications for that. Okay, so I wanna end on a slight disappointment. This is a lot of work and it's not ready for you to use yet, but I hope it's going to be ready very soon. You can actually sign up for an early access thing when it's available on d3.express, that's a URL. But if you are interested in this stuff, please come talk to me about it. If you want to help me build this so it's available sooner rather than later, please very much get in touch with that. And thank you.