 Hey guys, is this on? Great. All right, I'm going to talk to you a little bit about exploring multidimensional data today. I'm going to use this visualization parallel coordinates. First, this is a visualization of data about my hands that's being generated by this device right here. So if you look at the top right, let me zoom in just so you can see the text a little bit. Let me really zoom in here. We're going to go to the top right of Abe's chart, where information density is highest, and y'all are going to get a little confused. So prepare your brains, because I'm trying to outsource data processing to your visual cortex right here. So prepare your minds. This is tip velocity, so if I wiggle up and down, we get c-tip velocity 1 goes crazy. If I wiggle left and right, let me wiggle this way, c-tip velocity 0 goes crazy. If I wiggle forward and back, well, actually, I'm kind of going in an arc, so they're all going crazy. But this is the principle here, is to get data about this hand is multidimensional. We think the hands are three-dimensional things, but they're doing quite a lot. Just having a three-dimensional model of the hand is not really enough to understand what it's saying, or what it's doing, or how it's oriented. You could think of the hand as this vector coming out of the hand right here, and that's exactly what the palm normal is. So palm normal, as I go left and right, this is going up and down. And as a programmer, it's much easier to deal with the flat data structure, which has a lot of elements in it than some complex nested data structure. And so that's the principle of parallel coordinates. But let me explain it from a visualization perspective. So what is multidimensional data? Multidimensional data looks kind of like this. You have a bunch of objects, and they have a bunch of properties. In this case, we have some foods. We've got fish broth, blood sausage, puffed millet, some dates. And they've got values for protein, calcium. We'll take a look at this data set, actually, in a couple minutes. What is data visualization? Data visualization is, you can say a lot of things. I'm going to say it's imposing space and time on the data. So you say you want the data to go in some space. A very common space is this coordinate system. This is the Cartesian coordinate system. Right how it works is, if we have a pair of values, to decide where this point goes, for instance, if we have this 0.23, we go over 2 and up 3. And this was so hard for me to learn when I was a little kid, because I didn't understand how equations like y equals 2x became lines. I got that you could get a point from an equation. You could put in 2 and you get 4. And I got that you could do that a lot, but I didn't get how it turned into one whole line. And it was very confusing to me. Very popular visualization is Hondurasing's bubble chart. Let me just explain it in terms of its dimensions. There's five dimensions in this visualization. Fertility rate, life expectancy, population, region, year. And these are mapped to different visual properties. So fertility rate goes to the y-axis. Life expectancy goes to x. With just those two things, you'd have a scatter plot. But then you also throw in population to the radius. So bubbles are different sizes. Regions go to color. Years to animation. This is what D3 was really designed to do, was to take objects like circles and data and map. Somehow, one property of the data to some visual property, this is how you create visualizations. There's some problems with this approach for data exploration. You hit a limit at five or six dimensions. And you might say, that's enough. I don't need to go into seven-dimensional crazy space because there's only aliens living there inhabiting that space. But really, the world is multi-dimensional. There's a lot more going on. And the more dimensions you have, the more problems you have. We want to work with dozens of dimensions. And I'll show some examples of this in a moment. And another thing is that dimension encodings in a bubble plot, an animated bubble plot, they're all encoded very differently. So we have position and color and radius. If you're working with the data set for the first time and you're just throwing properties into these visual encodings, this is somewhat arbitrary. And it's great for design. And I'm not talking about design or having a message here. I'm talking about you exploring your data. So I don't have a narrative that I'm trying to put on you. I'm just trying to massage your brains so that you can look at the world in a different way. How can we give equal weight to every dimension? We could use a data table. And in fact, you will use a data table. You absolutely should be using. I mean, when you look at data, you look at it. I mean, if it's a CSV, you could look at it in a text file or you open it in a spreadsheet or you put it in a database. These are all tables. The problem with tables is they can't fit everything under the screen. So I have a relatively simple data set over here. This is some abalone. Abalone has a sex, length, diameter, height, weight, shucked weight. I don't even know what viscera weight is. And some rings. And it's just like, it's a long thing. So as a visualization, it's not so successful because it doesn't fit on the screen. So you can't see it all at one time. There's a little bit of trouble too. Weak visual cues, right? At a glance, you can't tell which of these is the longest abalone. It's not quick to do, but you can, you know, data tables have great things. You could just sort of call them to find out what's the longest, right? And you could do computations. You can do transformations. Excel is a really great piece of software because you can, I mean, spreadsheets in general are very powerful pieces of software because you can basically program. And it is actually a sort of visual programming because you can structure your computations so that the data and the computations are spatially arranged in a way that makes sense to you. So there's one more thing I want to just mention. What is exploring? So exploring is really looking for patterns. It's seeking structure in the data. It's perhaps applying a little theory. If you have some theory or some knowledge that you're interested in, you know, seeing if the empirical facts match your theoretical facts. It's also about having fun. And I'm gonna show you how to do that with parallel coordinates today. So let's get started. Parallel coordinates is an interactive multi-dimensional geometry projected to two-dimensional space. So I'm gonna show you with a bunch of examples. First is this nutrient scatter plot. So this is a little bit more of that nutrient data set. And this is just a kind of lousy scatter plot. On the x-axis I have fiber and I can change that to protein, sorry, y-axis. Change the x-axis to carbs. And as you can see, I have 14, 16 dimensions here. And I can kind of get a sense of it by flipping these around and seeing what pops up. If I put mono-insaturated, poly-insaturated fats, they get a bunch of fats all out in this region. This is great, but I can't see them all at one time. So how does it work in parallel coordinates? What I do is every food here, so here we have rose hips. Every food is a line going across this chart. We call this a polyline. And where it intersects this axis is what the value of that food is in this dimension. So rose hips have zero grams of calcium, like 180 grams of calories, sorry, 180 calories. 40 grams of carbs, some fat, et cetera, et cetera. The cool thing is we can just overlay all these points on top of each other, which looks really messy. It's not something you would publish in a magazine for sure. And it's definitely not a way to communicate the data to other people because this is not a very common representation, a lot of people. In fact, who has heard of parallel coordinates before? Who has used parallel coordinates to explore data? That's quite a lot of people. Who has implemented or made a variance of parallel coordinates? Ah, cool, a lot of people. So you guys are a unique crowd. Most people have never heard of this. And I'm gonna explain to you some of its more unique properties. Let me just show you a few more examples. One thing with parallel coordinates is if you are trying to use it in a presentation to explain to someone, a lot of times it helps to have a more familiar representation of the data. So here I have a bunch of points on a map. This was created by Mary Besica, this particular one. And this is a parallel coordinates view down here. As I filter in this view, only the values that I've selected in the parallel coordinates show up in the map. So this is a link to visualization. It's a really great way to use parallel coordinates if you're trying to tell a story. And also if you have a visualization that doesn't make sense for your data and you wanna use parallel coordinates just to augment what's already there. I'll show you a few more examples really quick. This is made by Oliver Yeh at App Store Rankings. This is a visualization of apps in the App Store. So what we can do here is we can say, hey, what are the highest rated apps? How many reviews do they get? And it's a whole distribution here. There's a screen line. I'm not sure it's so easy to see. This is the average of all of the apps. Let's see if I can find some structure here. Okay, so we can see that if we filter on the reviews column, we can see that higher rated apps that have tons of reviews tend to have slightly higher ratings. It looks like the average in the four to 4.5 rating, which sort of makes sense. This is called brushing is what we call this interaction. So I'm basically throwing a range filter down on this dimension. This dimension is log scaled. So I'm staying here at the bottom of my range, 10,000 reviews, up to one million reviews. And it's just like a sequel query or something. Let's see, here's another one. This is a dataset of Hipparco. It's basically a comparison of the magnitude of a star, its brightness and its color. And there turns out to be a relationship and it has a lot to do with the life cycles of stars. So this is an unlabeled visualization down here, but basically stars start bright and blue. And as they get older, they get dimmer and redder. This is called the main sequence. This uses canvas. So the cool thing about canvas is you can pick different compositing modes, which is something that SVG cannot do. And so here I actually use canvas's lighter compositing to bring out the dense intersections. So this dense intersection down here actually has a lot to do with the sea lips. Let me, now that you've seen some examples, let me start with the basics. Okay, so this is parallel coordinates on the left. This is the Cartesian coordinate system on the right. This is a point. So this is the origin. As I move along the x-axis, in parallel coordinates, I'm just moving along this x-one. If I move along the y-axis, in parallel coordinates, I'm moving along this x-two. So there's a one-to-one relationship between lines and parallel coordinates and points in orthogonal coordinates. The cool thing about parallel coordinates is right now, as you've seen in the visualizations I just showed you, you can add more dimension. So Cartesian coordinates uses up the plane very fast. You have one, you have two, and then you're done, the plane is gone. It even uses up three coordinates very fast. You only get one more, and you're done. You have no more space to spatially encode data by simply mapping it to a dimension. When you're looking at data to actually read the parallel coordinates, we're gonna go back to the nutrient data set. I actually think that the intersections with the axes lines are pretty confusing because most people, when they see parallel coordinates, they think it's a line chart. And in line charts, when there's a very sudden change in the direction of the line, you interpret that as, you know, it's like a stock market peak and then a crash. And we look for those things in line charts. But in parallel coordinates, it's just that that just happens to be that those shapes basically don't matter. What matters is what's happening between the axes. So I made this version of parallel coordinates that only shows the, let's see if, I think I can't zoom in on this thing very well. That only shows what's happening between the axes. So let's find a single food again. So these are sweet peppers. And so sweet peppers have 20 grams of protein and calcium. It's kind of like many slope charts if you're familiar with Tufti's slope chart. It's just a bunch of slope charts all together. Where's my, the other filter I've placed. So for instance, this relationship between fat and water, we see this sort of crossing here. Actually, it's even more intense between calories and water. This crossing where these lines are all passing through each other. So high calories goes to low water, low calories tends to go to high water. That indicates that there's an inverse correlation between these two dimensions. I'll show you over here too. So if I have a bunch of values that are all going down, like this, they're sort of all crossing right in here. This is how you use parallel coordinates is you're looking between the dimensions to see if you get correlations. So you're comparing pairs, but the cool thing is with the interaction, you're seeing how it's changing the relationship between many pairs of dimensions. So another thing that's important is not only this filtering, it's also the ordering of the axes. So for instance here, this is just some generated data. So I have the input X and it goes from zero to two pi and I throw it into a bunch of functions and see what its value is. So from zero to two pi, you get negative one to one for sine and cosine. Sine and cosine actually, if you plot these values together, they actually form a circle. And this is what a circle looks like in parallel coordinates. It looks kind of like a hyperboloid. If I reorder the axes, the shape goes away. So I don't see the shape unless sine and cosine are together. So there is some information loss and that's crucial to understand. If you have a bad ordering, you're really missing, can miss a lot of the information. In general though, most data sets, not every dimension is strongly related to every other dimension. There tends to be groups of dimensions that have relationships. That's something to discover as you're exploring a data set. Sometimes you don't even know what the dimensions are in your data set. It's very frequent, especially with scientific data sets, biological data sets. You have some dimension, you have no idea what it is, but you have some data values and by exploring you maybe you can find out. Okay, this is gonna be the hard part. There's a line to point duality, so lines and parallel coordinates are points in orthogonal coordinates. Dualities go both ways. So I just changed the context of this plot. It's gonna draw very differently than it did before. It turns out lines in orthogonal coordinates are points in parallel coordinates. So how does that work? Let's take a look down here. So if I have this equation x equals, or y equals negative x, right? So the value six goes to negative six. The value one goes to negative one. All of these points, they intersect at this one point right here. This one point has all the information about this line, y equals negative x. And any line has a point, this point just moves around in space. So this line, I can move it around over here. I can also move around this point. Lines are always infinite lines, even though I was drawing something over here. This point can even move outside the axis. If it's the right side of the axis, it's a positive relationship. So lines in parallel coordinates, even though in every version of parallel coordinates that is used to look at data, the lines always stop at the axes. You should really think of them as going to infinity. This is projective geometry, which there's a little exhibit on projective geometry downstairs. I'm not gonna get too into it. This is another cool thing. Rotations in parallel, or translations in parallel coordinates are rotations in orthogonal coordinates. So if I'm down here and I move in parallel coordinates like this, this is a translation rotation duality. And actually, just like with point line duality, the duality goes both ways. So when I'm translating in orthogonal coordinates, sorry, it just changed the context again. So now I'm back to the original point line. When I have a point in orthogonal coordinates and I translate it, this looks like rotation in parallel coordinates. I'm rotating, this is actually the point of rotation. This is an infinite line, so it really should be going on both sides. Let me show you this with some data, but this is, well, this data is gonna be a little tricky too. This is a hypercube. So I'm just gonna rotate this guy. Let me rotate it in XY space. So it's rotating in the plane of the screen. Right now it's really just looks like a cube. And so these lines going across, these are the vertices. So points in parallel coordinates, which is these dudes right here, they become lines in parallel coordinates. The edges become points in parallel coordinates and they're where these points cross. Not every line has a dot at the intersection point because not all the vertices are connected by edges in a cube. If I rotate it in another dimension, now things are starting to come apart. And you see these lines, right, these intersection points, remember the lines, they can intersect outside the axes. They're all just always going to infinity. In fact, they can even intersect at the point at infinity if the lines are parallel. Parallelism is very important in parallel coordinates. I don't have time to get in it. If anyone is really mathematical, I would love to see parallel coordinates embedded into a hyperbolic space because hyperbolic space also has very rich parallelism. But it blows my mind, I can't do it. Okay, let me rotate this in the fourth dimension as well. So now we're rotating in three axes of rotation. In four dimensions, there's six axes of rotations. In three, we have one here and one here and one here. And they're all the pairwise planes. So you're actually not rotating. We say rotating around the z-axis. Really, you should think about it in hyperdimensional spaces as rotating in the x-y plane. You pick two dimensions and you rotate in that plane. So with four, we have six combinations. And we can rotate in any of them. So I could rotate in yw. This appear as a three-dimensional projection of a four-dimensional shape. This is made by Miloche Cosmitter. The fourth dimension is mapped to color because there's a, so I believe the orange is close to you and the blue is far away. Okay, so that's how these points are constructed. But let's see the cool stuff. Okay, those points that represent the edges, they move in a really interesting way. Let me rotate in yz here. Okay, so again, we're talking about the edges of the cube, the edges are rotating around like this. They're traveling in a circle. Circles and orthogonal coordinates become these hyperbalas down here. And so for the mathematical of you, conics always transform to conics in this projection. And if I make this go a little crazier, these are colored by the axis pair they belong to. So the reds are all in the xy. Greens are yz, blue is zw. Don't worry about the words coming out of my mouth, they're just, you should listen with your eyes. These are meaningless words. So actually, and these are lines in four dimensions. So a four dimensional line is represented by a multi point. So it's actually one point for each axis pair for that line. And you could think of it as the projection of that line into that, forget it. Let me show you another one. This is another four dimensional shape. I did all of the four dimensional regular polytopes. This is a 24 cell. I haven't rotated in the third dimension yet. It's gonna be nuts when it happens. It's just rotating in the yz plane right now. So you see that some of these points, right? They stay all in a vertical line. What is the significance of that? Let's go back here. When you have a bunch of points in a vertical line in parallel coordinates, these are parallel lines in orthogonal coordinates. So, you know, you could do this multi dimensional architecture, some, you know, some like alien Bauhaus with parallel coordinates, absolutely. Let me make this look a little crazier before we leave it. Let me rotate in a few more dimensions. So parallel coordinates, you know, there's a lot of, it gets a lot of crap because it kind of looks like a hairball when you're using just the point line duality. You take, people just use it as a multi dimensional scatter plot. You take points in orthogonal coordinates and you put them to lines in parallel coordinates and then you just cram as much data as you can in there. Which is great because you can use interactivity to explore it. But there are ways of reducing the information if you're using regressions. Even if they're multi dimensional regressions, you can plot this data and they show up as lines and you can dramatically condense the information even more. You can even show surfaces and parallel coordinates but I don't understand everything and that's gonna take me a few years but it'll be in the library eventually. Okay, let me just leave you with the one that I've worked the most on. So, let's see, I have two minutes left. So, this is the USDA Nutrition Data Set. The USDA Nutrition Data Set has a lot of dimensions. That's 146, I believe. And I spent almost a year with the three trying to visualize this data set. This is what it looks like. Obviously, there are some problems with this visualization which is that none of the, let me show the ticks. So, this is not publication quality, obviously. What I did to make it a little more readable was I used the fisheye distortion that Mike posted. So, at least I could read some of the stuff happening on the left. And also what I did was I made it, so we have food, let's get the original data visualization, avocados, grab a California avocado. What I did, what I made it is when you click on this food down there, it reorders all the dimensions so that avocado, the value for avocado which is this purple line, let me hide these ticks, they're ugly, is always descending. So, it just takes the scaled Y values and it reorders the dimensions so that avocados are mostly water, so water shows up on the left. And then what I can do is get rid of the search filter and see how the rest of the data set has changed. And it's not so interesting, but what is interesting is when you do meets. So, this is what I wanna leave you with. Do you guys see that? Maybe you don't. Okay, there's a lot, this is very rare in parallel coordinates that you get something like this. This looks like someone just took a paintbrush and took red and drew it across and took yellow and drew it across and took green and drew it across, like all in these layers. It's colored by food group here. And when I first saw this, I was like, this, I don't know, it just looks crazy here, but we'll explore it a little bit. So, here's a bunch of beef products. And if I go down here, these are a bunch of fast foods. And so, that kinda makes sense because fast foods are composed of meats and so, and then they've got flour and stuff. These turn out to be amino acids. And so, fast foods are just like watered down meat, basically, in terms of amino acid. But then, if I go down a little more, I'm getting beans and bread and all this other stuff, pie, potatoes. Beans don't have any meat at all in them. Oh, beans are kind of a crappy color. They're tough to see. Beans don't have any meat at all, but they do have these amino acids. And this is something that I didn't know was that vegetables had similar amino acid profiles to the animals that eat them. And I haven't really researched this. I assume that this has something to do with what is considered an edible food and what is not. I assume what happens is that when animals eat these foods, they accumulate these things. The point is that I took a single food and I changed how space and time was imposed on the rest of the data set based on that food. And that's something I would love to see a lot more of in visualization. And all I did was change the axis spacing here. You could do more. You could make this line. This line could be perfectly straight. Or you could even have some of the axes shorter shorter than others, right? It doesn't have to be all equally sized axes. They could be like this. You could tilt the axes. There's all these things you could do, all these ways we could explore. And I think this is kind of like a visual spreadsheet. It would be a better tool that we could use to explore data, a more powerful tool. That's all I got, except this last one. Force directed parallel coordinates. This is just nonsense right here. That's all.