 So this is a little video on making eCDF plots empirical cumulative distribution function plots. So first just importing some libraries ggplot and dark nerd themes. nstats is to pull in a triangular distribution, dpliers for data manipulation, tidiers for pivot longer function, and glues to give some kind of f-string I'll be using as a label in some of these plots. I'm going to set the c to 42 and n which is just going to be the number of samples to 100 and then I'm going to save an object called p. I'm just going to save it to ecdf.jpeg. Step one is to make a tibble which is just a data frame and I'm going to call the first column color one and generate a random triangular distribution with just some different parameters. I'm in max in mode and repeat this for four colors and then pivot longer which means I'm just going to turn those columns into rows and then have a value column called sales. So these colors are different colors of shoelaces and the values are sales. Then I'm going to group by shoelace and summarize just getting the median sales and max sales to get a summary data frame and I'll pull on this right now to grab some values I'll use in the plot a little bit later. So I want to get the top median sales and pull that out and round it. I also want to get the bottom median sales using top end but with a minus one. I also want to get the top max sales and then the bottom max sales and if this doesn't make sense right now that's all right by the time the plot is complete all these values will be used and it'll be clear how they're being pulled into the plot. So as a first pass I'm just going to pass this data frame to ggplot and plot the ecdf function and here's the standard ecdf function. There's a number of things that I do want to change. I want to turn this good plot into a great plot so there's a few tweaks that I like to make. One thing that's not too necessary but I normally try to at least iterate on a little bit is the line size. Then I'm going to go ahead and add the dark nerd theme to this plot just to give it a nice dark color. It's always good to have labels for plots so for this plot I'm just going to call this shoelace sales and scale y continuous. I want to have very few ticks on the y axis. The reason why I think this is valuable for ecdf plots is normally you're just looking at a couple things. You're just looking at max values and medians. Now sometimes that's not the case and there may be another quantile that you want to look at. In that case I would just suggest bringing the breaks to the level of interest. So here I'm looking at the max and median. Those are the only lines I want in my plot. I don't want other lines distracting. Now the opposite is true on the x axis. On the x axis I think it's important to show a lot more granularity. The reason for this is that we're showing full distributions and when your eyes are drawn from one of these ecdf curves down you want to have a placeholder for the nearest value of sales. So the percent tiles are kind of a low resolution on the y axis but I think it's best practice to have high resolution on the x axis. I'm also going to give us some breathing room by increasing the limits from 0 to 28. We don't have to go out to 28 but by the time labels are added I think that extra space ends up being needed. It's just a preference. Now what I'm doing here is adding a annotation as a segment. This is to draw a line from the median of one ecdf curve down to the x axis. And I'm just playing a little bit with the color and the alpha. The alpha is your opacity level. Now I want the line segment to match the theme so I'm going to pull in one of the colors from the theme. You'll see that this line segment is similar color to the text so it just kind of plays nice with the rest of the theme. The next annotation after the segment is the text. For the text there's a few things going on. In the label there's a glue function which glues together a variable to a string. This works like f strings if you're familiar with f strings. And then there's a function from stringer called string wrap or stir wrap where I can set the width of the box that I want the text framed in and then h just 0 means that I want things left aligned. So all the text is aligned on the left and finally again to match the theme pull in that color. So now that there's a line segment drawing a line from the ecdf median down to the x and some text stating what that line means. I'd like to be a little bit more explicit and draw the viewer's eyes to the actual point that's getting drawn down from. So to do this what I do is add a geom curve and there's a lot of variables to play with and it does take some just iterating moving things back and forth to get the variables right. So in my mind I think this plot looks fairly good for one case. And by one case I mean the group of three things the line segment the text and the curve. The one I just addition here to spice things up even more is to pull in a new color theme for the ecdf lines. So it's not just the panel that's changing theme but also these line colors are changing. And for the purpose of this video I'm just going to assume that what we're doing is selling shoelaces where we have white shoelaces, salmon colored shoelaces, red shoelaces and green shoelaces. A note about sizes. I was reading through the visual display of information by Edward Tuft or Tufty I'm not sure how to say his last name. He has a section on width versus height and one thing he says is you should be making the width much wider than the height. And so in this case especially with ecdf plots since we do have a high grained x-axis I do think it's important to really stretch out the width. So in this case the width is twice as large as the height. And I would even say that you may be able to get away with more than that. Maybe somewhere between two and three times the height for the width. And again I think it does help to stretch it out a little bit for ecdfs just so you can get more resolution on the x-axis. There's also this idea of the golden ratio where the width is the height times the golden ratio. He stated that that's a nice idea but really there's lots of other geometries that are golden for other reasons than the standard golden ratio. And again as long as you have your plots wider than taller the plot will be appealing. I'm not so sure if that's true or not. I don't know much about the psychology or theory. I just read the book and apply and just kind of adjust things to my own taste. So my own personal preference is to make ecdf plots very wide. The median sales of white shoelaces is low and then I'm stating what that median value is and drawing the line down to that value. And this is in comparison with the green shoelaces where sales are high. I always like this idea of a tale of two cities or a comparison pre versus post and I think this is a good way to show the difference between the white shoelaces and the green shoelaces. Something else that you're getting visually even though it's not stated explicitly is between those line segments that are coming down to the x-axis is the difference in median. So if you were to draw a horizontal line between those two vertical line segments you would get your median difference. In the code there's really just a lot of repeating. The same chunks that have the annotation, the line segment, the text and the curve, min and maxes of the median and min and maxes of the maxes. One thing I think is interesting to show is that the white shoelaces have both the lowest median and the lowest max. Altogether that color shoelace is in low demand. The shoelace that wins for sales is the red shoelace. Thanks for watching.