 So, welcome everyone to our session in machine learning and data management. Today we have a great line of speakers that are raising from topics about automatic machine learning, model agnostic measures, and so forth and more. But before I really get into and start our first speaker and introduce our first speaker, I'd like to first highlight our sponsors for the day is RStudio, and the sponsor for the session is Membridge. If you have a chance to thank them, please do so. Okay, so with that, I'll get started. And so our first speaker is Alex, and he's a data scientist in Germany, and he studies mathematics and he has a master's degree in data science. And today he will talk about automatic machine learning in R using this package called MLR3 AutoML. And with that, please welcome Alex. Yeah, hello everyone. Welcome to my presentation on MLR3 AutoML. This is a package for automated machine learning now. And this work was done during my master's thesis, which was supervised by Ben Bischel and Martin Binder. And you can find the code along with all the documentation and examples on this link. I also shared it in Slack, so yeah, please feel free to check it out. Yeah, the name is quite long. I know MLR3 AutoML. So let me break it down. The first part is MLR3. And maybe some of you are already familiar with this. This is a powerful object oriented and extensible framework for machine learning. Hence the name machine learning in R, and it provides a really nice and rich package ecosystem, which has a lot of tools for machine learning that you can use to build all kinds of applications. For example, an AutoML framework. Here in the bottom of slide one we have a picture of all the different components that are included in MLR3. You can go into detail and all of them, but just to give some brief overview, you have tools for tuning different machine learning algorithms, you have all kinds of utilities. You can build machine learning pipelines, and so on. So it's really nice. And yeah, I encourage everyone to check out MLR3. The first part of the name MLR3 AutoML is AutoML. So few words on what is AutoML and what is it good for. In general, it deals with automating machine learning workflows. This could be, for example, pre-processing, model selection, hyperparameter tuning, anything that you need to go from a raw data set to a tuned machine learning model. You can serve different purposes. I see two very nice use cases for this. On one hand, if you're an experienced machine learning practitioner, can give you a very nice baseline model with almost no effort. So all you do is plug in your data set and you get some reasonable baseline, more or less a lower bound for the performance of machine learning on your data set. Or maybe if you're starting your machine learning journey, it can also help you build your first model because the AutoML tool will avoid a lot of the common pitfalls. There are many different approaches, how to solve the AutoML task, and we decided to use the combined algorithm selection, hyperparameter optimization paradigm, which is explained in the image on slide two. So on the left hand side, we see a machine learning pipeline with some different pre-processing operators, perfector encoding, also PCA. And we also have different machine learning algorithms like GLMNet, support vector machine and gradient boosting. Now finding the optimal model has basically two parts. First is which operators should be in this pipeline. And second, what should the hyperparameters be? And we see this encoded in the search space here on the right hand side. So for all the machine learning operators, we have categorical parameters. For example, encoding is a categorical parameter, either one-hot or impact or learning algorithms. We have three different options, GLMNet, support vector machine and boosting. This is just an example. And we also see associated to the boosting different hyperparameters for this as well. For example, the number of boosting iterations. So in general, we have a large search space, which is also nested. And the AutoML problem boils down to optimizing or tuning over this large, large search space. I will spend a bit more time later explaining how we do this. Okay, so MLR3 AutoML, now we're bringing it together, is an AutoML package for regression and classification. And it uses different components from MLR3. So we have first some automatic preprocessing using MLR3 pipelines. We also include some very well-tested learning algorithms. I will explain in more detail what those are later. And the pipeline and model hyperparameters are optimized jointly, just as I explained on the previous slide, using the hyperband algorithm. We additionally include a portfolio of known good pipelines. So these are hyperparameter configurations where we've seen in our benchmarks, good performance previously. But to go into a bit more detail on how the tuning works in MLR3 AutoML, we have two steps. First, we start by evaluating the hyperparameter configurations from the fixed portfolio. So these are eight simple models that are evaluated sequentially, just to give a very rough baseline on even the most challenging datasets where you cannot train a very complex model. So in terms of tuning this portfolio, if there's still budget or time left to do some further evaluations, we continue the tuning with the hyperband algorithm. Hyperband is a multi-fidelity approach that speeds up random search and it's based around budget parameter. So this controls how long the evaluation of a single hyperparameter configuration takes. In this case, we use sub-sampling rate here. So in the beginning, we train lots of different models on small subsets of the data, which is reasonably fast. And then as the tuning progresses, we move towards larger subsets of the training data, but training fewer algorithms. This tuning procedure is depicted in this bottom illustration on slide four. So on the left-hand side, we see a graph where we have the cost function, depending on a hyperparameter configuration on the y-axis and on the x-axis, we have the training budget. And in this example, we see eight different hyperparameter configurations that are trained. And at the start of the tuning, we start with a low budget. So let's say the sub-sampling rate of 12.5% of the full data set, and we evaluate all eight hyperparameter configurations and get some associated cost measure for those. And the general procedure now is to discard the worst-performing ones. So these four hyperparameter configurations with the highest cost are discarded, and the remaining half are evaluated again on the larger subset of the training data, which we see here. This procedure is now iterated. We discard the worst half of the hyperparameter configurations. Continue training the better half on a larger budget, until only one hyperparameter configuration is left. This whole procedure is then iterated over different starting configurations. And this is shown here on the right-hand table. So we have different brackets where we have a different number of starting configurations and slightly different setups for the budget. And general hyperband is not only for this AutoML project. You can use MLR3 hyperband as a plugin for basically any tuning scenario that you have in MLR3. So this is all the time I have today for the background. Let's look at some examples. Our first example is how to use MLR3 AutoML in general. So we have this AutoML interface function, which has one required argument, training task. So if you're familiar with MLR3, this is basically a wrapper around a data set and could be a regression or classification task. So that's all you have to do is train your model. And once this is finished, you can predict on some unseen data. The AutoML interface function comes with different customization options, which I will explain in more detail. So you have, for example, a runtime. You can set a custom performance measure to optimize for. You can select different learning algorithms. You can influence the type of preprocessing that is happening and also add some additional parameters. So, yeah, let's look at another example where we add some custom learners and the runtime budget. In this example, I use the empty cast data set, which is for regression. And I use some regression learners. So a random forest from the Ranger package and some linear models. We can set the learner time out to 10 seconds. So this means every single hyper parameter configuration. The training is stopped after 10 seconds. And overall, the runtime is kept at 300 seconds. So after 300 seconds, tuning is stopped. And just as before, then we can train predict and so on. The default ML3 automations with the Ranger implementation of a random forest. We have a gradient boosting from the XGBoost package and logistic and support vector regression from the linear package. And these algorithms showed very stable performance for us on a variety of data sets also provided good coverage. So, more or less, most data sets should be should be handled well. But if you're not happy with those three, you can input any learner from MLR three or the extension packages, which should give you access to most common machine learning algorithms. Yeah, as I explained, you can set timeouts. And one other nice aspect is that the learners are encapsulated in separate our sessions, which means that if you train 20 different models, if one of them fails this won't affect the other 19, which is very important for the stability of the system. Second example, I will skip for the sake of time, but just briefly, you can add your own parameters to the search space and also transform them to the arbitrary functions. And all the learners that are included in MLR three AutoML also come with a predefined parameter space. And the last example that I have for today is how to influence the pre processing. So one of the blocks MLR three AutoML comes with three predefined pre processing settings. You have no pre processing if your data set is already sufficiently pre processed. You have the stability option, which deals with missing data with numerical columns high cardinality features, basically anything that could lead to a failure in the learning algorithms. You have a full pre processing option, which adds tunable imputation methods, tunable factor encoding methods, and a PCA for dimensionality reduction. As an alternative, you can also provide your own custom graph object. So basically write your own pre processing pipeline. And this is shown in the code sample here in slide eight. This is the MLR three pipelines package. I'm creating a pipeline here for fictional imbalanced classification task. So we use a pipe operator for imputation, then add a pipe operator for minority over sampling using this mode algorithm, and also some class weights. Now you are, you can basically create any pre processing pipeline that you can imagine you and pass this in as the pre processing argument to our auto interface. Okay, yeah, that's all the examples that I brought for today. And last question I want to highlight is how well does it work in practice. We created this on the auto ML benchmark, which consists of 39 very challenging classification tasks, and in terms of the time budget we chose a very constrained time budget of 10 minutes for the small tasks, and one hour otherwise everything larger than 10,000 observations. But a more extensive benchmark is also currently underway, and we're curious to see what the results will be. So we have the auto ML frameworks that we benchmarked against for auto glue on tabular, auto escalone and H2O auto ML, as well as teapot. And in this small benchmark winning framework was auto glue on tabular. The performance of MLR three autumn was slightly worse. So we see, on average, the AUC on binary classification tasks is 1.1% less. So for multi class tasks, we have mean accuracy, which is 2.8% below the winning framework auto glue on tabular. But I want to highlight that MLR three autumn was also the only library, other than auto glue on tabular could finish all the tasks on time without failures. Because that while the performance might not quite be at state of the art levels, it's still a very nice and stable and can handle any data set you throw at it. Yeah, and lastly, I just want to give a shout out to everyone in the open source community. So thanks to team open ML to the R foundation and to everyone who's contributing to MLR or MLR three. If you want to try out the project. Here's a link to the give top. And again, it's also in the slack. If you want to keep in touch. It's easiest to reach me on LinkedIn. Thank you very much. Okay, all right. Thanks, Alex for this nice presentation. A couple of questions that we'd like to share. The first question is, can this package run on multiple cores or on like a cluster system. So in general, MLR three supports parallelization with the future back end. So you can do this. There are some limitations, though, because of the hyper band algorithm is in its nature sequential. So you might start with nine hyper parameter configurations to begin with which I evaluated in parallel. But if one of them takes very long, you might have to wait until this one finishes before the next stage starts. So there's some limitations with this but in general we support parallelization. Awesome. The next question was is the rational behind the timeouts. Could you explain a bit more about how it's how the this framework decides when to time out and finish on time. So I think that's one of the key advantage of this packet. So, if you don't provide any settings, this timeout or this timeout is basically infinite. So you will have the hyper band decide when to end the tuning. But you can set of course the timeout for some overall time if you say for example my system should run for five hours or something like this you can put this in. And you have the individual learner timeouts. Because sometimes you might have the situation where one of the learning algorithms is not super well behaved and might take forever for example and we want to avoid this. This is why we have timeouts for individual learners as well where you can set, let's say 10% of your overall time or something like this. And after this stop the evaluation of any single learner. This is so we can cancel learners that that take too long. Do you think it's possible like with a follow up question do you think it's possible to predict a good timeout. Right, because there's there's always going to be a trade off between like waiting a little bit more to find the best hyper parameter for example. And there will be a certain point where there's the expectation of reaching that decreases right. So do you think what are your thoughts on that. I know that there are some auto ML frameworks that do things like this. So for example they perform some polynomial regression based on some very simple features just predict some order of magnitude for the runtime. This is possible. But yeah, currently not implemented but yeah would be would be nice to have. Okay, awesome. Okay, so with that we're going to move on to our next speaker so please think Alex for his presentation. There's also Alex while you're there's also some questions in the Q&A box so whenever you have time. Please do and you may feel free to answer the questions there. Thank you. Okay, with that we'll have our next speaker. Hello, my name is Kasia Penkala and today with my colleague Kasia Voznica, we would like to share with you the results of our work. We are part of MI Square Data Lab at Warsaw University of Technology. Today we'd like to present new our package TripLot. TripLot offers more diagnostic approach to compute and visualize variable importance in predictive modeling. This approach takes into account correlation structure between variables in input data. In general, TripLot is developed as a response to the challenges faced by existing variable importance techniques. As the package name suggests, this chart is composed of three components. Together they summarize correlation structure and variable importance. What is worth highlighting that TripLot is not a related tool but is the part of Dalek Universe. Dalek is a bunch of explainable model analysis tools developed by our research group. Let's start with a brief reminder of what problem we are addressing. Principle of predictive modeling is simple. On the base on input data we would like to predict expect output. This output might be different depending on the application. For instance, it may be probability of customer's default in predict scoring price of apartment in considered location or many different application. On the other hand, input are variables connected with expected output providing supplementary information. This may be information about applicant age, his income or credit history in the case of credit scoring. In classical predictive modeling input feature are stored in tabular data. The relationship between descriptive variables and expected output may be very complex and difficult to capture for human perception. So we use machine learning models. Black box tools. They are very effective in finding this relationship but it is hard to summarize how model works and why make this decision. This is why in recent years bunch of explainable artificial intelligence methods have been developed. X-I-Y methods provide insight into machine learning models and try to summarize the dependency between exploratory variables and model output. The examples of these techniques are partial dependency profile and or variable importance. But very important information which these techniques ignore is internal structure of input data correlation between columns. In real world variables are not independent entities. Often substance of them describe the same phenomenon from different perspective. For example, how rich customer is. So it is natural that these variables are correlated. This dependency influence model process and X-I-Y methods try to eliminate bias caused by correlation. In triplet package we would like to reinforce X-I-Y interference by adding information about correlation structure as input to explanation methods. In this presentation we will focus on variable importance. For every single variable we want to attribute how much model performance depends on this feature. To assess this we perturb input data and create new observation in which we try to imitate excluding the impact of considered feature. After perturbation we check how model performance has changed. The most common perturbation is permutation. We break dependency between variable and target variable. So we permute one selected column and check how much model performance change. We repeat this many times to estimate expected change of model performance. In a result we can assess the importance for every feature and summarize of which feature caused the biggest disturbances in model predictions. This method is model agnostic and straight forward, but there are some challenges. This approach doesn't work in the case of correlation between explanatory variables. If we treat every variable independently and permute them separately column by column we may create very unlikely new observation out of input data distribution. It is easy to understand on the example of two skill of soccer player, dribbling and ball control. Lehmann would say that they are highly correlated and if player is good at dribbling he is also able to control ball. And we can observe this in real data. If we permute ball control column independently we could create strange player with high value of dribbling but low value of ball control. In this region of data model may be unstable and provide out of range predictions because it has no observation during training model. We can mitigate the problem caused by correlated variables by using variable importance for groups. Let's look again at the football player's example. If we permute variables dribbling and ball control together we can rest assured that we won't receive a pair of data points that will be out of the distribution. Permuting correlated features together and assessing the importance of the whole groups of variables may provide more concise explanations and more truthful picture. But what if we don't know the internal data structure that well? Well, let's look at it this way. It's relatively easy to use variable importance on correlated variables when there is only pair or two pairs of them. But what if the internal structure is much more complex? If strongly correlated features create multiple interrelated groups? In that case defining such groups is not trivial. For the task we can use a dendogram that is a diagram that shows hierarchical clustering of correlated variables. This facilitates making decisions about the number and composition of groups that will be used in variable importance calculations. But then the question still remains. At what cutoff point should we calculate the variable importance? How the variable importance looks like when calculated for five groups of strongly correlated variables? And how does it look like when it is calculated for only three groups of little less correlated variables? We can try to do it by trial and error. And for different cutoff points calculate variable importance. But well, that's created a lot of overhead. So the triplet does it for us. It takes every cutoff point from the dendogram that is defined by the successive nodes and it calculates the variable importance for newly created groups. And to make the picture complete, on the left triplets show single variable importance. In that way in one picture we get the information about the simple variable importance. Then on the right we can see how the data is structured. That is how variables are correlated. And in the middle we can find out how different groups of more or less correlated variables influence the model. This is how real example of triplet looks like for a simple model. Again, single variable importance on left, correlation structure on the right and hierarchical variable importance in the middle. For the purpose of this presentation we are using a simple model built on Boston dataset, included in MLBench package. This model is built on only a few features out of the dataset, namely six variables. Tax rate, accessibility to highways, probability to ratio, percentage of lower status of the population, average number of rooms and ratio proportions. Okay, let's understand how the triplet works by building it. On the left we see single variable importance for every feature. Now let's make the rest of the chart step by step. We start by looking at the variables tax and rad. Thanks to the dendrogram we can see that the correlation between tax and rad is the strongest. Importance of this group is equal to almost four. Second part of correlated features is LSAT and RM. Importance of group that consists of this feature is equal to almost eleven. Next we see that when we add PT ratio to tax and rad we get a correlation equal to 0.3. And again thanks to the chart in the middle we see that importance for this three elements group is equal to five. Finally correlation between group LSAT plus RM and group PT ratio tax rad is 0.11. And importance of this five elements group is equal to 12. And at the end correlation between LSAT plus RM plus PT ratio plus tax plus rad group and B is almost negligible. So finally the baseline for the model is 12. Finally we can notice or highlight the following. Even if we don't see many correlated features we can already observe that feature of importance is not additive. We can see it by looking for the most important variables LSAT and RM. Their single variable importance values are about eight and six respectively. But importance of this group is equal to less than eleven. And we can clearly see it thanks to the triplet. Triplet package also provides a possibility to make a chart for local variable importance. It addresses a similar problem. How single variables influence the prediction of a given observation. What is the correlation between variables and how groups of variables impact the prediction. It is based on an experimental method for local variable importance called predict aspects. This method is also implemented in the triplet package. At the end we would like to show you the code that produces the triplet. You can see that after building the model we create a Dalek explainer. A useful adapter for the machine learning explanation. Afterwards triplet object is built on the explainer and then plotted. To summarize we would like to highlight the necessity of taking into account the input data correlation structure when we are building machine learning explanations. To mitigate the problem correlation making variable importance results misleading we can use triplet. Triplet package is available on CRUN. And more information about methods described here can be found in our pre-printed paper as well as on the github. Thank you very much for your attention. Alright, thanks KataZerla for a nice presentation. Oh that's... Okay. So with that we'll take some questions. So KataZerla are you there? Yes, hello. Hi, awesome. I have a couple questions about this triplet method. So it's very interesting how you're addressing this question about a variable importance. But instead of using a more, I guess, a more traditional way of looking at it, in terms of a more data science perspective there are many variables that are correlated with each other. Taking account of that correlation structure, how can we visualize and assess the importance of variable importance? And so I was wondering, the first question I had was related to scalability. So this is using permutation for each variable but in a very grouping manner. So if there's like five variables then you choose two variables that have high correlation and then you permute it while preserving that correlation structure. And then you assess the importance of that group. Is that right? Yes, that's correct. So in that case then, in the example that we showed, I'm sure this example I showed was just to illustrate the point, but if we're looking at more than five variables, like let's say we're looking at 100 variables, or I think a lot of this correlation structure is very high in image data. And so if we apply this method for image data, then I imagine that the computational costs would be quite demanding. And so I was wondering what are your thoughts in terms of extending this approach to data sets to have more variables? Yes, that's correct. Actually, we focus on the tabular data. And so making it work on different kind of data would be computational heavy. And because it, as you said, it calculates the feature importance for every newly created group of features. So at some point when we are getting the bigger and bigger data set, it does take some time to calculate the whole triplet. So at some point we need to decide how many, because this is calculated by using the subset of the data. So at some point we need to very carefully consider how to choose the parameters for those computations. Kashi, would you like to add something to that? I totally agree with you, because if we consider more variables, we create more groups and it will be, I think that this may be computational heavy because of this new created aspects. So we may think about creating some penalization during the creating group split. We need to think about it. There's a related question here that asks, again, for large data sets, hyperclustering can be unstable. I think this means in terms of this very greedy algorithm manner that you're just grouping two variables that have high correlation and so forth. In terms of large data sets that might not lead to the optimal solution in terms of grouping variables. Do you have any thoughts on elaborating different approaches in terms of grouping variables? Yes, so in general we are thinking about extending our approach and not using only the hierarchical clustering, but also some other methods. And what we would also like to approach is to give the possibility for the user to use the user defined methods. So because what we are really looking here, we are looking here for the order in which the variables are joined into the groups and at what points they are joined. So our idea for next experiments with this package is to provide the way for the user to define it, to define the grouping methods or the measure that would be used for building such a tree. I think it's a very interesting point that the tripod is addressing and I think in some sense the tripod is a very extensive approach. It looks at based on the grouping algorithm, it looks at every single subgroup structure of the data. But in reality, depending on what you are projecting or what your objective of the machine learning algorithm, I think there might be some groups that are more informed than others and maybe by developing some algorithm that strategically chooses which group to focus more on and I think that would also be interesting to see in the future. In terms of scalability, that would definitely help. Yes, that would help. Exactly. That would support the calculations of the triplet on the bigger data sets for sure. Okay. All right, so with that, we'll move on to the next viewer. Please thank Katarzyna, both Katarzynas for a nice presentation. Thank you very much. Thank you. Okay, so we are moving on to our third talk. Hello, everybody. My name is Jonathan Bourne. I'm a PhD student at UCL, studying complex networks and their relationship to cascading failures on the power grid. During my PhD, I developed the strain elevation tension spring embedding algorithm, also known as SETC and the author and maintainer of our SETC on Cran. So what is an embedding? I like to think of embeddings as a way of representing data, which is more useful for a specific purpose than the original representation. And so to give you an example of that, we've got this MNIST data set, which are the numbers 0 to 9 handwritten by a lot of different people and two different ways of embedding data. MNIST is a matrix of I think 6428 points and it can be embedded using any kind of tabular method. So we've got principle component analysis, a common statistical data compression and an embedding method and TSNE, which is T distributors to casting neural embeddings, which is a nonlinear approach. And what, you know, the task in this case we're saying is, okay, we want to be able to move all the similar numbers together. So all the 0s plus together or 1s plus together or 2s plus together. And we want to do that in a two-dimensional space. And what you can see is that the nonlinear approach of TSNE, although unsupervised, really does manage to do this itself and group those similar numbers together in an unsupervised way. And so for this purposes, this embedding is more useful because we want to be able to see things that are similar close together than the original representation. And so graph embeddings are the same but for the irregular structures of graphs. And there are two different sort of main families here. There's those embedding methods which are good for visualization but not very useful for analytical insight. And those which are good for analytical insight but less valuable for visualization. And Sexy takes techniques employed by the visualization methods but manages to use this for analytical insight. So it's kind of a hybrid of the two. And what is our Sexy? Well, our Sexy is a physics-based deterministic graph embedding algorithm that represents a network as a system of springs where the edges are springs and the node features are forces and it tries to position the nodes such in embedded space that the forces on the nodes are cancelled by the forces on the springs and the system is in a state of equilibrium. And so how does it do this? So the easiest way to think about this is looking at the image on the left which is a one-dimensional embedding. So you could consider that a network which only has one variable on the node so that might be age or amount of money or something like this or it could be a binary variable such as you are a member of Ruby or you are a member of Ruby. And so Sexy lays out the graph in some sort of plane, graph space and the nodes are free to move in the one-dimensional case up and down. So effectively they act like beads on a rod and the nodes, two nodes connected to each other are separated by some distance d and the equilibrium state is found when the elevation of the node hence Sexy the elevation of the node is such that you get this equilibrium. So in higher dimensions like this two-dimensional example on the right the nodes become discs that move on parallel planes and then you can keep adding dimensions as you measure. So this is a very simple example. It's a network of two groups group A and group B where group B has a positive force and group A has a negative force so group A will move down into the screen and group B will move back out of the screen and attempt to anyway the sum of the forces in each group both add to 1 and minus 1 respectively so group B adds up to 1 meaning it's a force of 1 third on each node and group A adds up to minus 1 so it's a group force of minus 0.25 on each node. So what then happens is although each node is free to move on its own any displacement from the initial plane causes the springs to stretch in accordance with Hook's law this generates a force and when the sum of the forces attached to each node are equal and opposite to the force exerted by the node itself and this is the case for all nodes in the network then you have an equilibrium so you move the nodes to those certain position each node has an elevation the springs are extended this creates a strain and a tension elevation tension strain and this gives you your final embedded values one thing that's quite interesting to look at is in this case the movement of the nodes as they move up and down heading towards convergence in this case where you can see the image on the right has the movement patterns of the elevation patterns across time for each of the nodes and what we can see is that nodes A, B and C have the same convergence pattern and nodes G and F have the same convergence pattern and this is because these nodes have the same relationship to each other as they do to the rest of the network so they're structurally identical within the network so they act in the same way we can also see that the group A nodes which is A, B, C and D all have their final value below the initial plane so they have negative embedding and the group B nodes are above that and this is expected because one has negative force and one has positive force so what can you use set C4? well you can use it for quite a lot of different things but the original purpose was for understanding more about the robustness of networks under targeted attack but here we can see five networks which represent PilsQuintet and PilsQuintet is a family of five different networks which are structurally clearly quite different but have identical statistical properties and so they're in many ways quite similar to Anscombe's quartet which I'm sure you're aware of is a collection of four different figures which look very different but have identical correlation mean number of summer and so the task with PilsQuintet is whether a graph embedding algorithm can distinguish between these different families so what I did in my first paper was I generated 100 examples of each of the PilsQuintet types and embedded them into two dimensional space using various different embedding methods and as you can see set C in the lower right hand corner was very very good at separating out these families so we embedded them at network level took averages and then saw what we got out and set C has almost perfect separation there's a little bit of mixing up going on between groups A and B but overall very very effective but set C works not only at network level like the previous slide but also at node level and so one of the things about PilsQuintet is it's got these two groups group A and group B but it also has subgroups A1 and A2 and so embedding the networks at group level A and B sets it automatically distinguishes or separates in the embedded space out of families and you can see these patterns emerging quite clearly at node level so this image is actually one of the original uses of set C and one of the reasons I created it which is to look at the robustness of power grids under random attack and what we can see in the lower right hand corner is that the tension element of set C so that is the spring tension when the network is embedded is a very good predictor of the robustness of a network during when it's under attack and we can see that other network embedding methods were not quite as effective so set C can also analyze social relations and social networks in this image the historical conflict in which the Medici family took over Renaissance Florence is reenacted with different levels of importance laid on inter-family marriage and business relationships. The figure shows how the Medici skillful network positioning allowed them and use of marriage allowed them to take over Renaissance Florence as a final example we can see here set C used in mapping and this is very nice because it shows really what set C does so although set C embeds a network on high dimensional manifold it's really a kind of network smoothing mechanism so it smooths the node features of the regular structure of a network and this allows you to create quite nice maps and these maps if you look on the elevation and tension facets they actually take quite a lot about you know population in this case power generation where people are living where power is being generated and can actually provide quite useful qualitative insight into what a network is doing when it's embed in a geographical space so all set C has been implemented in R on surprisingly that's why I'm here and it's got a whole load of documentation for you to check out and it's available on cram which makes it nice and easy to do with your computer. When you look at our set C when you install it there's five different sets of functions it can seem a bit daunting but really three of them do the same thing they automatically choose your hyper parameters to ensure a very quick and effective embedding so these are the auto functions auto auto high dimensional or the bi-connected components which does a kind of tricky way which can be faster in certain circumstances so what you do with those is it selects hyper parameters which gives you this very smooth very fast embedding you can see on the right in the line in purple and prevents divergence which is the line in red or the slower types of convergence. So the package is fully documented so you can get to grips with simple examples of sets and get your head around what sets it means and then focus on embedding your own data and understanding what your own data can tell you instead of worrying about confusing code so in summary sets is a physics based deterministic graph embedding algorithm it can predict a loading network level it's implemented in R in the R setsi package available for CRAM and there's detailed documentation and vignettes available for you. So if you want to find out more about setsi look at my paper the spring bounces back which is available in applied networks or I have a pre-print on high-tech called high-tension lines which is available on archive we've also got the website the documentation website we'll just install using install packages R setsi and have a play around so I'd like to thank the EPRSC International Doctoral Scholars IDS grant for funding my PhD the UCL Myriad High Performance Computing Facility for making a lot of the calculations possible and the UCL R user group who funded me to come to this conference so thank you very much to everyone and thank you very much for your time. Goodbye Okay awesome thank you for your nice talk we're going to set up so we have a couple questions here the first question is how does setsi scale with the size of the network? Yeah that's a really good question and there's a bit more detail in the spring bounces back paper but roughly in terms of time complexity it's quadratic it scales quadratically with the number of nodes although this is a bit like dependent on the actual structure of the network so one of the functions I mentioned was the setsi bicomp and what this does is where possible it breaks the network up into small independent subnetworks solve those and then puts it back together and so depending on your network structure generally it's faster but it can depending on your network structure be dramatically faster because essentially you could break down say a network of 10,000, 20,000 nodes into 5,000 networks of 4 nodes or something like that which should be extraordinarily fast relatively and in terms of memory complexity it's linear and it uses very very little memory to run so you can actually quite large networks on quite a small computer I can't remember now but I think when I was solving a network with about 40,000 nodes and one and a half million edges I think that took about four hours which is not hugely fast four hours and but only used like 300 megs of RAM so you can run things quite parallel if you've got a lot of networks to solve I follow a question with that does that mean that the algorithm will depend on how setsi breaks down the network so are you identifying the community structures of it and so itself could also be a quite time consuming part as well yeah so I mean setsi doesn't look for the community setsi in a way can be used as a community detection algorithm like you saw when I was embedding peels quintet at node level effectively it discovered these hidden communities within the network but what a by connected network is essentially if you have two clusters of nodes or a network but the network can be cut into two by slicing through a single node that node is a by connecting point and so where the network can be cut into separate pieces with a single cut that's what setsi looks for so that's a standard algorithm that you can use and you can just separate networks into their sub by connected components and when a network has that structure which most networks do you can solve them you can solve the sort of by piecemeal approach you solve all these smaller parts of a network and then you reassemble it into the overall embedded structure so it's only looking at these single single-tent work these by connecting nodes when you use the by connected network version but that you may not want to I mean generally speaking for one dimensional or binary embeddings yeah use that but you may not want to so there's a couple of different versions available depending on what the use case is the other question that I had was so I've also been using a lot of these network in my analysis and often I found that using R like since we're in this R conference I mean there are packages that help network analysis but was there a reason why you chose to implement your method using R mostly because that's what I'm programming right but the iGraph package a lot and I think iGraph is a really really fast generally speaking because it's as closely possible wrapped around C++ so it's as lightweight as it can be and then tries to do all the heavy lifting in C++ it's also got a Python implementation and so that's really why and I actually really the big package on Python I think is network X and obviously I'm more you know in R person but I prefer the way iGraph works to the way network X works and that that could be subjective and so that's really why R sets it was implemented in in R I mean I should mention actually this is a bit of an aside I should mention that although this is the machine learning section it's kind of controversial where the set to you is actually machine learning right because it's deterministic algorithm right you know and it actually doesn't use statistical inference you know it uses physical laws so is it machine learning is difficult but really the reason I'm here is because every single other algorithm that it's comparable to is a machine learning algorithm so the ones I compare against are like Notavec which is a kind of word to variant for graphs or you've got DGI which is you know a deep learning approach and all of the techniques to do with graph embedding and really all of the modern embedding techniques are machine learning techniques and I think one of the things that's interesting about sets is that even though it's technically dumb right how it does it's not an intelligent system actually it can find really quite nuanced and complex structures within the algorithm within these networks in very very low dimensional space and so I think even if it isn't machine learning algorithm itself it can act like one and it can also act as the sort of input to other machine learning systems. Yeah so I have a little bit different take on it so I think a lot of complex networks I don't know how many people in this audience knows which are complex networks but I think after Jonathan I think I know a little bit more about it but it's it's mostly based on its statistical physics so that's I think in that perspective of course this approach that you take in is somewhat unique but the whole the field in itself is sort of grown in a statistical physics type of area and in that sense you know you could say that it's related to statistics maybe not traditional statistics that we know with like p-value and so forth. The other thing is that I think what's nice about a lot of these machine learning algorithm is not necessarily about the pedigree of where it's originated but more about the goal of the problem and here I think what Setzi is doing is doing something very similar to a lot of dimension reduction algorithms and applying that here so in that sense I think you know I think you fit perfectly fine here but I'm pretty sure in the audience there will be audiences who are unfamiliar with this graph structure but yeah I think I really enjoyed your talk and I look forward to your next projects and so forth. Okay we were a little bit over time so I'm going to so please thank Jonathan for his nice talk in complex networks and graph embeddings. Alright so that's the end of the session in machine learning and data management I think one thing for sure I think there's many different areas that maybe you might be unfamiliar with some of the people in the audience but what's very clear is that there's many things you could do with R. You could do auto ML, you could do feature selection. You could use network analysis and then also if you're interested in studying economics then there's also work that is being done here and so I think if there's one take home message I think it's very clear that the R community is growing and thank you all for joining us Bye