 Hi, I'm Alan Inglis from the Hamilton Institute in Maneuth University, Ireland, and today I'd like to briefly talk about visualising variable importance and variable interaction effects in machine learning models. So visualisations can be a powerful tool for communicating relationships in the data or exploring the data structure, and we felt that informative graphics, particularly in the areas of importance and interactions, have somewhat been overlooked or underused. So the challenge was to try and visualise these metrics in a sensible and informative way. So we created the OR package VIVIT which stands for Variable Importance and Variable Interaction Displays that contains a suite of plots for jointly displaying the interactions and importance as well as the partial dependence. So the data we're going to look at today is cervical cancer risk factor data and is comprised of historical medical records as well as personal information. So we fit a random forest model with biopsy results either healthy or cancer as the response. So to begin we're going to take a quick look at traditional importance and interaction plots. On the left we have a traditional importance plot and on the right we have every two-way interaction plotted. And as you can see the number of variables starts to get very large plotting every two-way interaction can become complex and difficult to read. So our job was to try and combine these traditional plots into something a little more sensible. So our first idea was a heat map displaying importance and interactions jointly. And in this plot the variable importance is on the diagonal and runs from white to red and the interactions are in the upper and lower triangles and run from white to blue. And they allow us to now clearly see which variables are the most influential in our model. Now we're not restricted to just one importance measure. You can use an embedded approach if your model allows it or you can use an agnostic approach. To measure the interactions we use Friedman's hate statistic. So to highlight the most influential variables in our model we use a hierarchical cluster in the leaf sort algorithm to push the influential variables to the top left of the plot. This allows a user to quickly identify which variables have the most impact on response. So our next idea for jointly displaying importance and interactions was a network graph and here we give a visual representation of the magnitude of importance and interaction not only by color but also by size. Here the node size and color represent the variable importance and the edge size and color represent the interaction strength. And as you can see if the number of variables is very large we allow to threshold out low interaction values. This allows us to highlight influential variables in our model. We also allow for clustering via the iGraph package. So now I'd like to briefly talk about how partial dependence plots or PDPs are constructed. To begin we define a grid along a feature then we model predictions at each grid point. Now each line per instance is called an individual conditional expectation or ICE curve and the average of these ICE curves is the partial dependence. And here we can see an example of the ICE curves for one of the variables and the average PDP in black. So our idea was to display these PDPs in a generalized pairs plot style. So in this plot in the upper triangle we have the two-way partial dependence. On the diagonal we have the individual ICE curves and the one-way partial dependence and in the lower triangle we have a scatter plot of the data all colored by the predictive values. Now PDPs can sometimes have an issue where they extrapolate in areas where they weren't trained so to ameliorate this we mask out areas where there are no data. So PDPs can be useful for investigating how variables either jointly or singly impact the response and for classification we can display the predictions on either the probability or loaded scale. Our final visualizations today are what we call Zen partial dependence plots and these show selected panels of all pairs of PDPs in a space saving layout. Now Zen plots were designed for showing pairwise high dimensional data in a zigzag layout and what we do is we use a greedy Eulerian path algorithm to display variables with high interactions starting at the top and generally decreasing as you go down. This allows us to highlight influential variables in our model. So in summary today we presented a novel approach to visualizing importance, interactions and partial dependence. So we feel that when compared to traditional visualizations our approach allows us to evaluate these metrics in a more efficient way. Our visualizations work for regression or classification models and are designed to work with multiple different model fits. So one drawback however is the H statistic is computationally slow so future work may involve trying to optimize the H statistic algorithm. I'd like to thank you for your time today and thank you for listening. Goodbye.