 Okay, the next talk is Anand. He is CEO at Gramminer, and he will tell us more about visualizing the data from machine learning. Have fun. Thanks. That's gonna be a fairly technical talk at least towards the end. So best of luck with that. Let me start with a problem that we faced right at the beginning. Most of the models that we are creating these days are backbox models. There was a telecom company that said, look, we want to predict telecom churn. Ultimately for every customer that we try and retain, we only have to spend about 40 rupees. But if I have to, if I lose them, to reacquire them is gonna cost me about 80 rupees. That's a huge shift in terms of the amount that we have to spend for retention. So if we can get the customers that are gonna leave before they leave, we save a hell of a lot of money. That makes sense. Question is how do you do that? So we started with a simple white box model. We said, what if we build a decision tree? And what this said was if the customer has not made an outgoing call for 15 days, they're done, they're gonna leave. Make sure you retain them. If they have made a call the last four days, they're fine, leave them as they are. But if they have made a call last in the last five to 14 days, then check if they've made more than one recharge in the last quarter. And if they haven't, then check if that one recharge was for at least the equivalent of $20. If that's the case, then they are not likely to leave, otherwise they will. So that makes it relatively easy for a human to figure out why people are leaving or what are the things that I need to look for when a person is leaving, which is when they last make their call, how many recharges did they make and what was the value of the recharge? Three simple parameters. And what this did was based on the simulation, saved them approximately about 39% of their total cost, which was helpful. The confusion started when we looked at optimizing this and when we tried support vector machines, it turned out that the model could get them to a 66% accuracy improvement in cost, which was great, except that nobody really understood how SVM worked in a way that they could understand it to their business teams. So which meant that the project manager had to take something that they had no idea of to their bosses and explains saying, look for all I care, this is witch magic. It somehow works, it gives me 66%. The project got canned. If we had gone with the decision tree model, it would have gone through, but despite the better accuracy, in fact, because of the better accuracy, it ended up getting canned. This is not unusual. We were working with an agricultural company and they said, can you forecast prices? We said, fine, let's try it across a series of models and the model accuracies, you can see converges consistently for all products towards neural networks as the most accurate model. This, again, is not unusual, that's what's happening today. We are in a situation where our black box models are the most accurate and we need better ways of understanding them. Well, we do have ways of understanding some of these, some of the popular libraries that you've seen, you should be, you are potentially familiar with. TensorBoard is perhaps the most famous of the lot. Given a TensorFlow network, you can visualize on top of that to get a feel for what the layers are. I'd strongly advise you to check out, what happened to this slide? I had activation atlas as well. Activation atlas is a pretty nifty piece of network visualization, which I should, yeah. What it allows you to do is, if you have an image classification, it allows you to look at each of those layers and show how each of the images would be classified by those layers. What does the layer think is the structure of the underlying image? That's a pretty powerful way as well. Now, while these tools exist, the trouble is some of these tools seem like magic is well to me. And this talk is really about demystifying how does one go about visualizing models from first principles. So I'm gonna take a model from scratch and we are going to visualize it step by step using Python and Excel. Let's dive in. The model that I'm gonna start with is the Singapore flat resale price data. That's available on data.gov.sg. What it has is information on the resale prices of flats that were sold in a certain time period, in a certain location, a certain flat type, which is the number of rooms that you have, and more details such as how large the flat is, what the type of flat is, when it was leased out first and what the resale price was. Now, we have all of these flats. Can we group them based on the similarity of the flats? The way a real estate agent sells these is to say that I have a two-bedroom flat, which is very different from a five-bedroom flat, but is a two-bedroom flat on the third floor very similar to a three-bedroom flat on the 15th floor, is a really large two-bedroom flat more similar to a small four-bedroom flat. How exactly do people behave when it comes to grouping these? That's the kind of information that we get from clustering. So let's take that. So the data that I have, I hope this is readable from the back, but if not, do let me know. I'm gonna load into Pandas. So we have the first five rows of this data holding the year, the type of flat. It's effectively the same data that we have from data.gov.sg. Now, to get a better feel for this data, I'm gonna switch over to Excel, because I personally find it a little easier to manipulate data in Excel, at least a little faster and more visually. So what I can see from this is if I do a quick pivot table of this, the year seems to go all the way from 2016 to 2017. That's the data set that I, that's the dump that I have. We have seven flat types. That's effectively one-bedroom, two-bedroom, three-bedroom all the way, and six and seven is multifunctional and executive suite. The number of, the story that the flat is on, and it could go up to 49 stories. The area in square meters, which could go up to about 259, the year the lease on the flat started, which could begin as early as 1966 and goes up to 2013, and the price which could range from 190,000 Singapore dollars to a little over a million. So that's the kind of data that we have. Let's cluster it. And the clustering process is reasonably straightforward. For those of you who are not aware, I'll walk you through the steps. Scikit-learn is a pretty good library where it comes to clustering. I'm importing scikit-learn.cluster. That's taking a five-bit of time because it's loading all the modules and I'm not on battery, but once it does, I'll just look up the help on scikit-learn k-means. There are different clustering algorithms. It almost doesn't matter which one you use because they're all equally confusing, but k-means is probably one of the faster ones. That's why I tend to use it. The thing about clustering is that when you start applying the clustering algorithms, you finally get a result which says this row belongs to this cluster. This row belongs to this cluster. Yeah, right. But what does that mean, right? Okay, now this is taking unusually long. So I'm just not sure if, oh, okay, I need to have closed Excel. Otherwise, it's still running Excel and yeah. Now k-means takes as input the number of clusters. Some algorithms take the number of clusters as input. Some algorithms say that I'll define my own count of clusters in some optimal fashion. HDB scan is an example of the latter, whereas k-means is an example of the former and there are a variety of other parameters, but really all it asks for is two things. Give me an array of numbers and tell me how many clusters you want and I'll get the result for you. That's fairly straightforward. So let's take the data which in this case is effectively all of the columns that we have. So give me all the information about the flags, excepting the first row because the, sorry, excepting the first column because the first column really is the ID of the flag and that has no physical significance. So let's get rid of that. Given this data, we are now going to cluster based on three clusters and randomly picking three, but I mean pick four, pick five, whatever doesn't really matter and fit it to the data. The result is an array which contains numbers zero, one and two all the way for every single row of the data. Let's take a look at what that is k-means dot labels converted to a list and that's a fairly long list of ones, twos, et cetera. And here's a stage where you don't really understand what's happening. You've got the result. Let's say we save it as an extra column in the data and open it to see what the clustering algorithm has really computed. It's just added this one column called cluster which contains ones, twos, threes. Now is where our work starts. Machine learning has done its job. Let the humans do their part. Let's try and see how clusters one, two and three are different from each other. I'm gonna do this just on Excel and say, well, R is a resale price for some clusters very different from the resale price of other clusters. So if I take the average of the resale price across these clusters and put it into a form that I can understand a little better, what this is telling me is cluster zero on average is resold at about 700 K whereas cluster ones only at 300 K and cluster two is at 470 K. So that's kind of midway. Fine, so which means that cluster one has the least expensive flats. That's one way I can label these clusters. I don't have to call them cluster zero, one and two. I can call them expensive, cheap, mid-range. What other ways could I look at? Maybe I could look at the floor square the area of the flats. So that's okay. I can now start labeling these as the small, cheap flats, the mid-sized, medium flats, and the large expensive flats. Now, which is a correlation that you'd intuitively expect, right? The larger the square footage, the larger the flat. But what about the rates? Is the price actually different? Well, I could compute that by saying what is the per square meter cost? That costs around $5,000 per, let's reduce that, yeah. Almost $6,000 per square foot, and this is a lot less. So not only is cluster zero having flats that are more expensive, they're also more expensive on a per square foot basis, whereas these are less expensive even on a per square foot basis, and these are somewhere in between. Now, that's helpful, but I'm having to do a fair bit of work, and the reason I was able to do it fast is because I've done this before, obviously, in preparation for the talk, and I kind of know where I'm going with this, but it's difficult to explore, and it would help to be able to visualize this. A simple way of visualizing it just in Excel would be to start putting a bar like this and say, ah, okay, so these are less expensive. Start putting a bar here, like this, and yeah, these are the less expensive ones again, and so on. Let's walk through an interface that we built that allows us to do it, and this is on an open-source repository, and I'll point you to that. Let's start with Singapore's flat prices. That's the same data that we had before, and all that this is doing is putting in a simple color coding on it to show that there are some relatively higher values, relatively lower values. I'm going to select all of the features that we took earlier to cluster it by, and choose k-means. So far I'm doing exactly what I did before, except with the GUI, I don't think that's the important part, and let's say we want three clusters again. Now's where the interesting thing begins. What this shows me is a summary of the exact same calculation that we made earlier. So if I look at, for example, the rate, the price, right? This is the price per square meter. Cluster one, this one is numbered, the cluster is differently. Cluster one is at 6,300 per square meter. So that's the really expensive flats, whereas these are the really inexpensive flats. Cluster two has flats that are pretty old, and cluster zero has flats that are midway. Cluster one has the newest flats. The newest flats also tend to be on relatively higher floors, higher stories, tend to have medium number of rooms. So now this gives me pretty much all the information that I need to be able to extract what each of these clusters mean. It's gone the opposite way from being very difficult to find out what I need to explain, what I need to say about a cluster. I've gone to, okay, there's everything that I need to know about the cluster. Still not helpful. That's too much information. That's not enough for me to parse. What I'd like to do is ask this, look, if you were a human, what would be the one or two points that you would tell me about each cluster? Let's do that. I'm gonna move the slider, which effectively picks the most unusual or significant parts of this, and I'll tell you how I got there, but what this is telling me is that the most unusual or most definitive property is that cluster one has flats that have a much higher average story than any other flat. These are the flats that are near the top. That's the most distinctive thing about any of these clusters. If I had to pick one more, then I'd say that these are the relatively newer flats. If I had to pick one more, I'd also say that these are also the most expensive flats. So if I had to pick a definition for this, let me just say the expensive, high-storied flats, that's what I label cluster one. Let me try labeling cluster two now. These are the old flats that are relatively small in lower, yeah, small in terms of number of rooms and area. So I just say the old, small flats. That gives me a label that I can describe them because the lease year is 1981 in comparison to the others, which are relatively higher, and the number of rooms is small, the area is much smaller. And if I had to label cluster zero, it's not the rest of the clusters, but it's the really large flats. The large flats are not the most expensive, nor the least expensive. They're not the oldest, they're not the newest, but they are just the largest. So real estate agent now has three kinds of flats to sell. Large flats, expensive flats in high stories, and old, small flats. Very different kinds of behavioral labeling. Now the way we got to this was by looking at the way a human would understand the differences between each of these parameters. So for example, let's take stories and rooms. Now five, four, three, that's kind of uniformly graded. It's not as if any one of these is an outlier, but between six, six, and 15, two of them are practically the same. One is definitely an outlier. So which is why this stands out to a human and the slider is simply mimicking this behavior. Can we spot the outliers in a successive way and go down to see what labels these clusters? Please, no. We're using a layman's definition of an outlier. And what we found is that a statistical definition of an outlier doesn't correlate with the layman's definition. The layman's definition is what we want. We just want to see if there's something unusual about the data. Is there a data point which is so far away from the vast majority of the others that it matters? And the exact process that a human seems to follow is to sort all the values and look for on the histogram or the chart whether there's a huge dip. And if there's a huge dip, they say, oh, that looks very different from the rest. But particularly if the dips are towards the end. This is based on about a few hundred observations of people asked to classify how exactly you see the difference. But would this be something that would apply in every situation? I think it varies on a case-to-case basis and we really have to look at what is the way in which people would be interpreting a particular data set. Another part of what we look at is the matrix. Can we take the scatterplot matrix of all of the parameters and ask? So by area versus rate, where exactly do different clusters lie? So we have clusters zero, blue, one in orange and two in green. And it's clear here that on a rate-to-area basis, for instance, that one is the expensive mid-area flats or on a lease-year-to-rate basis, one is the expensive, relatively newer flats. And there's other ways of looking at this. But this I particularly find useful when looking at correlations. In the earlier computation we had on Excel, we had the total, the lease amount and we had the number of square meters. The trouble is when you plot that directly on this, you'd get something that's fairly similar to this line. You'd get something that's more or less along a straight line. And that's because as area increases, the cost of the flat generally increases. So whenever I spot something that is usually along a straight line like this, let's say area versus room, then I say now it's time to take a ratio. We don't probably want the area and room as independent parameters. We want the area per room and it helps identify the features that are related to each other and will effectively do a certain amount of dimensionality reduction based on that. Another, I'm gonna skip the rest of it in the interest of time. Let's take another data set. This is from the health promotion board with whom we were working to see what exactly is our citizens behaviors based on. So each row is one citizen based on gender, demographic information like gender, age, et cetera, height, weight ratios, what variables they're using, how many challenges they've taken part in, how much they've walked, what kinds of diet they're consuming and a variety of other parameters. Can we take, well, a bulk of this data, run a clustering algorithm and interpret what we find from the clusters that we get? It turns out that the citizens in Singapore can be divided into four groups and I'm positive I'm gonna offend some people in this audience but let's. The distinctive feature of cluster one is that they seem to be extraordinarily active. 2000 in terms of active time, I forget what the units are in comparison with 126 or 105 or 55 for most of the other clusters. We'll come back to this. Cluster two is characterized by the fact that they've spent the least active distance by active time. In other words, they are the people that are walking the slowest. So I can certainly use the objective slow for these people. What else can I say about them? There's not much, if I keep going there's not too much that's interesting but they aren't really taking part in challenges and they haven't really walked much. So this is a group that you could say are, well, let's just, for simplicity's sake, call them lazy. These are the people that you need to spend marketing dollars saying, look, go out, start walking. Do some, do some stuff. Cluster three has a very positive height-weight ratio. They don't require much by way of diet. So they're naturally fit. And these people are not using wearables and if I go further, there's not that much more I can say other than that they're slightly older than the others and they're moderately inactive. Now these, I guess I could just call them the naturally fit people. What about cluster zero? More female than male consume a lot of health points and they have redeemed 20% compared to, almost no other sector has managed that many except for cluster three. And this was a bit of a revelation when HPV saw this, they said wait a second, hold on. It's humanly impossible to get to 7,800 points in this time period. What we're gonna do is label these, the cheaters, you can't get that many points by actually collecting them. You've got to have obtained them in some way that's not normally allowed. So we don't want to market to the segment. They're going up the points and redeeming them. So just stay away from marketing that group. Now what we have in this is an approach to label clusters based on a fairly simple way that you could do manually, just take each group, figure out what's common within that group that's different from other groups and try and put a label to that sector. This is a grounds up way of visualizing one particular technique, machine learning, sorry, clustering. And for the rest of machine learning for the kind of problem that you have, classification, forecasting, each of them would have their own set of techniques. But what I'm hoping this gives you is a certain confidence that you could apply a visual technique simply grounds up without having to go into the details of, well, a high end library. The code for this is on github.com, slash grammar slash GramX. We built GramX as a library on top of which it's possible to build pretty much any kind of visual applications. So this particular application called cluster is just one application that sits on top of GramX which is an open source data visualization platform. The way the code is structured is to prefer configuration over code. The intent is not to have to write code as much as possible. That's, I mean, there is a place for code, but by and large, let's just tweak stuff through configuration than otherwise. So to give you an example of how that works, this is running on my machine and this has been configured through parameters like you see on the right hand side. Let's take the flat prices example, that's a little smaller. In terms of the number of parameters, so flat prices. So if instead of, let's say the rate or in addition to the rate, okay, here's an example. What we have is the rate computed as a formula which is the resale price divided by the floor area. I could instead have clustered it by resale price for which the value comes from the column formula price and the area of which we already have actually. So I could simply have taken this parameter and that gives me the resale price as one of the features that I can cluster by. I can choose different clustering algorithms. I can choose what visuals we want to represent it in and stuff like this. The premise here is that by default, we don't want people to have to code, but when it becomes necessary, put in the plugins that enable it. So what you saw was an application built on top of GramX. GramX itself is primarily based on configuration where you can put in whatever code you want. It's largely Python based on the server side and JavaScript based on the client side. When you try it out, you'll find that approximately 80% of the first cut version of what you create will be configuration and you won't need much code beyond that other than front end displays that you create. But over time as your front end becomes more and more sophisticated, you'll find that you're writing a lot of code and you'll want to simplify it. There are a set of supporting libraries for interaction that are associated with it, but this talk is not really about that. The visualizations are more a part of that library. So do give it a spin and there are several other demos on gramna.com slash demo that will give you a feel for the kind of stuff that GramX can create. I'm going to end with one example that we used to visualize a forecasting problem. This was an airline. They said our cargo is delayed and we want to know why. So we built a model which showed where exactly the delay is. So evening shift versus morning shift versus night shift. The thickness represents the volume of cargo. The volume is fairly similar across each of these shifts and by weekday, so Fridays are bad. Saturdays are good. The color represents the speed at which the cargo moves. Product category turns out that fragile items represented by ZDH, they're pretty slow. And also the breakup of these. So why are mornings slow? Mainly because Friday mornings and Thursday mornings are pulling it down, but Saturday mornings are great. Why are Friday mornings bad? Mainly because fragile items are extremely slow on Fridays, but not necessarily on Saturdays where they move really fast. Fragile items when shipped in mostly empty containers, part shipped, they tend to be moving slow, but if they're shipped at 60 to 80%, they tend to be moving pretty well. So given this, the question was, can we identify what are the underlying factors that make an impact? And one of the factors it turned out was the total number of train staff. Now, next question. Can we find out the impact of that and simulate that? What if instead of the five train staff at this airport, I had six train staff or seven train staff or eight train staff? What kind of delay will I be actually having at this particular airport? Or if I had a budget cut, then if I had only three people, then what kind of service level can I really commit to? This was what the airline started using as a planning tool that allowed them to identify what the right kind of staffing was to ask for and actually apply at their set of airports. We moved from a disaster where we had a sophisticated model that worked, but we just couldn't get it through because nobody could understand it. To model that people not only were they using in a way to understand what was happening, but as a negotiation tool to say this is what's gonna happen when we make these changes. So, hopefully a few things that you would have gotten out of this are black box models. They're increasingly accurate. We need, therefore, ways of understanding them and interpreting them. Interactive visualizations, visual summaries are definitely a good way of doing it, but making sure that you provide an interactive layer on top of it that's customized to your needs, that seems to be the way to go these days when it comes to visualizing and understanding models in general. There are several libraries. Tenzer Board is certainly the most popular and there are ways of building these. Grammix is one of those. Do give them a spin. All the best. We have time for some questions. Somebody? This is the coolest thing I've seen all day. So thank you for that. Thank you. When you're thinking about how to visualize a black box model, how do you approach it? Because seeing it on the screen, it seems intuitive and you can sort of say, oh yeah, of course you would do it like that. When you're staring at a neural net, it is not at all obvious how you're gonna try and explain that, let alone visually. So how do you begin to approach that? The person who builds the model should not be the person who visualizes the model. Make sure that happens. The person who visualizes the model should be the person who has a business problem. If these two are met, then you can hope that they have the experience to know what to ask for and that's tricky. It may happen, it may not happen, but it definitely does not happen if the person who's modeling it is the person that's trying to communicate it and if it's not somebody on the business side. So these are prerequisites. Thank you. Further questions? Then let's thank again the speaker.