 Hi, I'm Arvind. And I'm excited to show you how to build interactive visualizations with Vega, which is a visualization language that we've been developing at the University of Washington Interactive Data Lab. But before we dive into Vega, I wanted to start with three little circles, borrowing one of Mike Bostock's D3 tutorials from a couple years ago. So here is what the code looks like to visually design these circles. What we're doing is essentially binding some data to these circles and then using those data values to determine visual properties like their exposition and their radius. And this is an example of what we call declarative design. So we're specifying what we want from our design rather than how it should be computed. So underneath the hood, D3 is building out all the four loops we need and making those DOM calls, all those lower level details for us, and we don't have to do any of that ourselves. Now, what if we were to introduce some very simple interactivity? So if I was to drag these circles, I could move them up and down. So what would the code for that look like? Well, it looks something like this. I have to register event listeners for the mouse down, mouse up, and mouse move events. And in these callbacks, I have to give explicit steps for what I want to happen during each of these events. So what's the problem of specifying interaction in this way? And I should note that this is imperative design. Well, the first problem is that it falls to us, the developers, to manually maintain state. And Shirley, I think, did a really great job yesterday of identifying why this gets really, really complex. So in this simple example, my state is figuring out which circle I'm currently dragging. And when I was putting this together, I actually screwed it up the first time. I forgot to clear out the state on mouse up, which meant that my circles kept moving around even though I'd stopped dragging. And the state in this is just four lines, and I still made a mistake. The second problem is that we have to redefine the appearance of our visualization in multiple locations. And this makes it tricky to figure out why the visualization might look a particular way. I have to trace down multiple code paths. So debugging becomes much more difficult. And with this sort of imperative paradigm, I also have to deal with these low level details, like the fact that D3 stores its event in this particular variable, or I have to call a stop propagation when I'm dragging so that text selection doesn't fire. Really low level details that have nothing to do with dragging and moving circles around. And finally, for those of us who've actually programmed this way, we've undoubtedly felt callback hell. These callbacks can nest in arbitrary ways, and their execution order can be both unpredictable and interleaved. Now this is kind of a toy example, so I wanted to show you that these problems also exist with real world and more complex use cases. So here's the canonical D3 example of brushing and linking in a scatter plot matrix. And because brushing is such a common interaction technique, D3 provides this brush component to us that we can instantiate in our code and pass three callbacks to. And so let's examine these callbacks. Ooh, I like the IMAX because this code is actually readable. Let's look at these callbacks in greater detail. So we can see we have to manually maintain the state of our brush, including clearing out any previously active brushes. In multiple locations, I have to identify which set of points are currently highlighted. I have to deal with low level details like the fact that the current brush is stored in a variable named this, or that D3's brush component exposes its extents in a particularly opaque fashion, this double array. And on this point, it's worth noting this brush component also black boxes all of its event processing. So it registers the event listeners that it needs to drag out a brush, including some support for mobile devices. Now the problem with that is, well what if I wanted to use alternative events to trigger that interaction? So for example, what if I wanted to brush and pan in the same visualization? Well, both of those by default are triggered by drags and so they conflict. And so I would likely want to be able to specify some way of resolving that conflict, but I've got no way of doing so. And so this is where reactive programming comes into play. You can think of it as working in a spreadsheet. So when I enter new data values into spreadsheets, spreadsheet cells, formulas that reference those cells automatically update to reflect these new values. And so this is another example of declarative design. My formulas are saying what I want, so the sum or average of a group of cells rather than how or when those things should be recomputed. Another way of thinking about it is, well do you ever want to write a spreadsheet with event callbacks? Because I sure don't. That would be a nightmare. And so Vega uses this paradigm in a very similar fashion. So events are our data and more specifically a form of streaming data and the formulas are dynamic variables called signals and these signals automatically update when new events fire. So let's actually look at Vega in more detail now. Here is what a Vega specification looks like and this produces a scatter plot matrix. So as you can see, again, very readable code, it's a JSON object that has definitions for input data as well as operators to transform that data and we can load data from a remote URL or embed values directly in the specification. We've got definitions for scale functions that allow us to map those data values to visual properties like position, color, size. Guides that visualize those scales either as axes or legends and then finally are marks. And in this particular example, you can also see the special group mark that we have kind of like SVG groups. We can nest scales, guides and marks within that and it provides a very concise way of doing small multiple and layered displays. So alongside these sort of familiar visual encoding building blocks, Vega introduces reactive ones to do interaction design. So as I've already mentioned, we've got event streams. We also have a syntax that I'm gonna impact shortly for selecting the particular events that we care about. These event streams then feed signals which are dynamic variables. Their values automatically update and recalculate whenever a new event fires. Scale inversions that take these signal values which are by default in sort of pixel or visual space and move them up to the data level. So there's sort of the opposite of scale transforms. Predicates which build selections. So in this case, for example, it returns true or false if t.value lies between these min and max signal values. And then finally, production rules which use these selections to actually affect some change to our visualization. So this is all sort of abstract right now. Let's make it a little more concrete, sticking with brushing and linking as our example. So as I mentioned, events are a form of streaming data. So if I was to move my pointer across these rectangle marks, I might observe a stream of mouse move events. And so in our syntax that's quite simply denoted by the event type, along with the particular element, the target or the source element of these events. Events can be sequenced in one of two ways. So I can use a comma to merge multiple streams into a single one with those constituent events correctly interleaved. And I can also order events. So this is a stream of mouse move events that occur between a mouse down and a mouse up. That's a lot of words. You and I would call that a stream of drag events. We can also filter events before they enter a stream. So with square brackets, I can filter events based on their properties. With curly braces, I can filter them based on a time interval between successive events. So this is a concise way of being able to debounce or throttle events. And as you hopefully can see, this syntax was sort of very inspired by CSS selectors because we wanted this to be sort of a familiar process for people in the community visualization designers. So what streams do we need for brushing and linking? Well, we need a stream of mouse down events for that start position of our brush and a stream of drag events for the end position. Now we can use some signals to extract the particular coordinates of the start and end positions. And we can actually use those coordinates directly with a rectangle mark. And that's all we need to be able to draw a brush repeatedly on a scatterplot matrix. Signals now represent my entire interaction state and they automatically update, which means that I no longer have to do any manual bookkeeping, like clearing out the previous brush like I had to with D3. These coordinates can also feed a predicate function. So by default, this is just a simple selection that returns true or false. In this case, if the input parameter falls, if its coordinates fall within the start and end coordinates, so well that returns true and otherwise it returns false. And the way we use these selections are in production rules. So production rules are simple if then else change. So the fill color of the circle mark should be determined by the fact that if that circle falls within the brush, well color it blue, orange, or green, based on some data value. And if it doesn't, color it in gray. So here's an initial setup for what brushing and linking might look like. But here's actually the interaction it produces. If I start to brush orange and green points, suddenly blue points are getting highlighted as well. So what's going on? Well what's happening is, I'm using those signal values directly in my predicate and those signal values are just pixels, right? And so what's happening is that the same points that lie within those pixel ranges are being highlighted, whereas what I want is for only points that lie within the same data range to be highlighted instead. So if we switch back to our schematic, what I wanna do is first identify which scatter plot I'm interacting in and then use its scales to invert my signal values. So now the input parameters to this inside brush predicate are data values. And so what the predicate is now expressing is a data query. So it's looking at the sepal data value and the petal data value and checking whether these lie within the two extents. I can bring back my production rule from before and there we go. It is doing the right thing. If I brush orange on green points, I only see orange and green points highlighted. If I brush blue points, I see those highlight. Nothing weird going on. And so this sort of setup brings the advantages of declarative design to interaction techniques as well. So what are some of these advantages? Well, first the process of doing interaction design is now about combining and recombining these building blocks rather than programming from scratch. So it's much faster to iterate because there are only a set number of ways these blocks can combine together. And we no longer have to deal with any of those lower level details. These blocks sort of encapsulate and abstract all that away from us. So we'd argue that interaction design is also accessible to a larger audience. The second point is because events are just another form of streaming data, Vega underneath the hood can do a number of optimizations on our behalf. We don't have to worry about it. And that ultimately yields performance that is at least twice as fast as D3 and event callbacks. And if you don't believe me, we've even published our benchmarks. So you're welcome to run those yourself. And the third part is, the third point is because it's declarative, we're only saying what we want. We're not saying how these things should be computed. And so it becomes much more easy to retarget an interaction technique to different devices, different modalities and so forth. So for example, right now, this interaction technique is using desktop events, right? Mouse down, mouse move and so forth. What if I wanted to retarget it for mobile devices? Well, without touching any of the rest of, or so the heart of my interaction logic, I can just switch out those event streams and instead use touch events. And you, like I said, I'm not touched any of the rest of it. And now this interaction technique works on mobile devices as well. And as we'll soon see in some demos, signals don't have to be driven by just a single event stream. They can be driven by multiple. And so a single interaction technique can not only work on the desktop, but also on a variety of mobile devices. And you just sort of have one specification that deals with all of it. So like I said, demos. Here is the URL that most of these demos will be run on. So you're welcome to sort of tune me out and just play. Is that readable? Maybe I bump that up a little bit. All right, so here is just, I wanted to start simple. Brushing in a scatter plot. So we know that it works for sure. And here are our signal values. And you can see each one has a name and initial value and we can specify the streams that trigger different value changes to it. And in this particular case, every time a mouse down event fires, this expression reevaluates. And Vega has a sort of a very light version of the JavaScript syntax supported, a safer version along with some helper functions like iScale allows me to call a scale inversion in here. And so brush start and brush end are just coordinates in data space of my brush extents. Here's the rest of sort of the visual encoding. And in here, I've got my predicate function. So I'm saying my fill color, if it matches this test, which is basically just checking if some data values lie within these particular ranges, color it using this scale and field and otherwise color it in gray. And then right at the bottom is my rectangle mark for the actual brush. And you can see its positions are just being determined by the signal values directly. Because these signal values are data values, we also use a scale function. So moving up the ladder of complexity, here is brushing and linking in the scatter plot matrix that we just worked through diagrammatically. The signals are very, very similar. We've sort of just decoupled them a little bit more for clarity. So we're differentiating our start and end coordinates from our start and end data. Here is a signal that identifies which particular scatter plot we're in. And then we also just calculate a total brush signal that holds these extents all together. And then the predicate function is very similar. We're just checking if it's in the range. And here is the mark that builds out our brush. And so in that way, that's all I need to basically brush across the scatter plot. Here is where things start to get fun. So here is, hey, that scrolls. Here is a interaction technique inspired by Mike Bostock's cross filter JS. So as I start to brush in each of these constituent histograms, my bars start to dance as the data gets filtered out and re-aggregated. And the definition for this is, again, very similar because it's, again, we're just, it's another brushing interaction. So we're defining the start and end points of each of our brushes and then using that to filter particular data values. So these min day and max day are signal values. And then we check if they fall within the range. So we've definitely heard that this particular example is an especially overwhelming one because let me scroll all the way down here. 516 lines of JSON. That is a very large number. But what, if you study this carefully, what you'll see is it's about the same 120 lines of JSON, still a large number I'll grant, but repeated four times once for each histogram. So it's each histogram signals and data and scales and so forth, repeated over and over again. Some more examples here where re-ordering an adjacency matrix showing character co-incurrences in Les Miserables. And this is super simple. I was actually quite surprised at how easily this fell out. I just have two signals, a start and destination signal that basically track which sort of, you know, columns or rows am I switching around. And those signal values are used in data transformations to basically switch a sort order data value. And that's basically it. So that interaction technique sort of boils down to maybe a handful of lines of JSON. Here is another Mike Bostock example because he makes really good examples. This one is showing the airports in the United States sized by how many flights go in and out. And I can move my mouse pointer around the map to see all the outbound connections from that particular airport. So here's CTAC and, you know, SFO is around here somewhere and so forth. And what you might notice is I don't actually have to be on top of any of the circles to select them. And what's happening is underneath the hood, we're computing a Voronoi tessellation. And that Voronoi tessellation is actually what is driving the event streams to my signals. So I'm selecting whichever airport is nearest my mouse pointer rather than the one immediately underneath it. And finally, this is one of our favorite examples to show because we initially didn't think this would even be expressible. One of our star undergrad students built this and was surprised. So this is a technique called dimfiz that Brittany Kondo and Chris Collins, who's gonna be talking later today, introduced a couple years ago at the Infoviz conference. And it tries to emulate the sort of hands of Rosling-style storytelling. So along the y-axis, I've got life expectancy, along the x-fertility, and a number of different countries here. And when I hover over a particular country, I see its trajectory through time. And I can grab that country and start dragging to vary all the data that's currently being shown. This is a really nice way of navigating time series data and we can mark points of interest and so forth. And what's nice about this technique is the author has initially intended it, or especially intended this to be sort of a touch-based interaction technique, but here I am demonstrating it on a desktop. And it's actually the exact same JSON definition. We don't have to duplicate anything. All we have to specify is a comma with both the mouse down and the touch events. And those same set of events drive the same interaction logic. But we're not the only ones building Vega examples. We were really happy when we saw this one created by John Lee from Statwing. And this is a, as the title says, an interactive NBA shot chart. This is a really impressive example because I can start brushing along any of these dimensions to start filtering the data. And you can see sort of these stacked bars rise and fall as I filter the data out. I can also filter these histograms here. And all these numbers up here are being manipulated. I can also sort of drag on the court and so forth. And so all of this is being coordinated by Vega and quite performantly, I'd say. So that's quite nice. And the last example I wanted to show is we've been really excited to work with the folks at Wikimedia who've integrated Vega visualizations into Wikipedia itself. And so you can start to take these JSON objects, just paste them into Wikipedia article and boom, you have now got an interactive visualization. So here is a scatter plot of the price of or the most expensive paintings auctioned off so far. Looks like a regular scatter plot and that's actually already quite nice. A nice way to visualize all this data that Wikimedia editors have collected. But I can start to hover over points to get the thumbnails of the particular paintings. I can also click points to filter them out by particular periods or artists and things like that. So simple interactions, but really powerful in this context because otherwise all this data is sort of trapped, so to speak, in this table. And so the Wikimedia folks are really excited about the prospect of being able to do interactive graphics in Wikipedia. So I'm gonna take a water break and switch back to this example of the scatter plot. And one of the things that is both good and bad about declarative design is all the execution falls to the toolkit, so in this case Vega. That means that debugging something becomes particularly hard because you need to know the internals of Vega to really understand how to debug something. The flip side of that coin is we can build a specific tailor made debugger for Vega because Vega is a domain specific language and we kind of understand the sets of things we wanna do with it. And that's exactly what my collaborator, Jane Hofswald, has been working on. So if I hit this debugger, it starts to record my interactions, my signal values as I make them. And if I pause, I can start to go back in time and see how my interaction has actually been propagated through the visualization. So there are a bunch of really cool interactions built in here, so I can sort of navigate through the timeline to see the actual data values that the signals are holding. These red highlights indicate which signals were used in that calculation, how this all builds up. I can also sort of filter the timeline view to a specific period and so forth. The timeline also has a data table view. So in this case, this is an index chart, the data values re-normalize based on this index point. A very simple signal that uses it, but most of the magic is happening in the data tables. And so you can see here, and similarly I can go back in time that I can sort of visualize how my data values are changing in response to my interaction techniques as well. And so this is now brand new and live and note the alpha keyword there. It is alpha, we're really excited about it, but there might be things broken, so please let us know, but please let us know how you're also using it. So to dive back into slides, everything I've shown you is available at this URL, vega.github.io slash vega, including links to documentation, tutorials, our user group, our GitHub, all of this is being developed out in the open. But before I wrap up, there is one more thing. So a couple months ago, my colleagues, Ham Wang-Sapasawat and Dominic Moritz released the first version of Vega Lite, which is a higher level visualization language, sort of akin to Gigiplot or maybe the interactions you do in Tableau, really suited for rapidly generating statistical graphics. And since then, we've been collaborating to figure out how we define interactions in this high level language. And so I have just under five minutes and I'm gonna blast through the sneak peek. So here's what a Vega Lite visualization looks like. I'm showing a histogram of when flights take off and I can embed a data definition, import some data from a URL. And this specification describes how a single mark type encodes this data. And so you can see that I'm also specifying some transformations in line, like binning the hour and aggregating the count to build up my histogram. And the reason this specification is so concise is because I'm omitting a number of lower level details, like which particular scales I'm using, what are the guides, the axes, the legend, so forth. The Vega Lite compiler has a default set of rules that evaluates to fill those details in and produces a Vega specification. Now of course, as users, we might know better than this general purpose compiler. And so we can specify some overrides in there if we so desire. Now with a new operator, I can repeat this specification for three data fields. So now I'm showing the bin to hour, the bin delay and bin distance that all these flights reported. Can you tell where I'm going with this? So here I've got now three histograms and what I'm gonna do is actually add an additional layer. So this secondary layer that I've highlighted here, exact same histograms, but instead those bars are gonna be colored in gold. So I've got a layer of blue bars and then a layer of gold bars. All right, so this is the stage we've got with our visual encoding. Let's collapse that sucker and now focus on our interaction design. So in that first layer with the blue bars, I can specify a selection, which is a new building block we're introducing with Vega Lite. And this is a interval selection named region. I can give it any name, region is not anything specific. Once I do that, I can start to brush in each of my histograms. And what the selection is doing is it's mapping to the event streams, the signals, the predicate functions and the scale inversions that we saw with Vega. But I don't have to deal with any of that if I don't want to. Of course, I can override some of those with my own custom values. Maybe I want custom event streams. And this project keyword is something that we're also introducing called a selection transform. So just as we have data transforms, we're also going to have selection transforms, which identify very common design patterns and ways of sort of overriding those event streams and signals, et cetera, et cetera, that these selections represent. So now I've got brushes, but they're not really doing anything great. And so let's actually have them be useful. And so the final part is to use those region selections to filter out the data in my gold layer. And that's all I need. 35 lines of JSON. So more than an order of magnitude smaller than Vega. And just to show you that, there are some other interaction techniques we've considered. Here is panning and zooming in a scatter plot matrix. And you'll notice that panning in one of the scatter plots keeps all the other ones in sync, including zooming. And not only can I pan and zoom, but I can brush in the scatter plot as well right there. And this particular brushing technique is only highlighting points that they fall within all of the brushes. By changing just a single property, I can change that to highlight if they fall within any of the brushes or just have a single brush like my Vega and D3 examples. Up top here is an overview plus detail interaction technique. So I brush in that top chart, which is showing an overview of stock price data. And in the bottom one, I'm seeing those selected points at a higher resolution. And finally, just to demonstrate that it's not just brushing that we care about. Here is the index chart we saw with Vega. In this case, it's using something called a point selection instead to select that index point and then renormalize my data. So all of these are tens of lines of JSON rather than 50s lines of JSON. Hundreds of lines, I don't know. So I wanted to wrap up by just sort of reflecting on why we at the Interactive Data Lab are particularly excited about Vega. We think, as this title suggests, that it's a really viable platform for visualization. So what do we mean by that? Well, not only can you and I write these Vega and Vega-Lite specifications, but because they're JSON, they can be embedded in other software to generate visualizations programmatically. So I've already shown you an example of that with Vega-Lite, which is a higher level language that's automatically producing these Vega visualizations. If you saw the OpenVis keynote from last year, you saw my advisor, Jeff Herr, demonstrate Voyager and Polestar, which are two new data exploration apps we're building in the lab that also generate Vega-Lite and Vega visualizations automatically. We've been excited to see that the Jupiter community has started working on a library called Altair that's providing Python bindings for Vega-Lite. We're hopeful that this approach is gonna spread and be adopted by the other Python data-vis vendors so that all these libraries will be producing a common format but be doing their sort of vendor-specific stuff on top, like their aesthetics and the interaction and so forth. If you were watching OpenVis two years ago, I had the pleasure of introducing Lyra, which is our illustrator-like design tool for data visualization. And underneath the hood, Lyra is producing Vega visualizations as well. Every time someone drags and drops a data field or makes some sort of modification like that. And as I showed you, Wikipedia is also building on this tool stack by allowing Vega visualizations to be embedded in here. And what's nice about this ecosystem that we're building is that now no one tool needs to be the be-all and end-all of sort of the data analysis visualization pipeline. Because all these tools are speaking the same language, I can maybe start my data exploration process in a tool like Voyager, get a quick statistical graphic out, export that via Vega-Lite into Lyra, start to touch it up with some custom aesthetics, maybe add an annotation layer to tell a narrative, and then export that via Vega and embed it in Wikipedia, right? I'm using a variety of different tools, each tailor-made for a specific task, going back to sort of some of the stuff we heard about user-centered design, keeping that focused. Similarly, maybe I'll start in Altair, go down via Vega-Lite, Vega D3, and all the way to SVG, and then start to use something like Illustrator instead. And I've left this big, all open space here because I think we're just starting and we're building all of this out in the open because we want to work with the community and try and figure out what are this sort of new visualization and data sort of exploration applications we can start to build now that we have this sort of tool stack, the foundations of this tool stack in place, mature and performant. And so all of this, the overall URL of the Vega project is at vega.github.io, and I am over time, so sorry about that, but there we go, thank you.