 Hello, fellow art practitioners. My name is Zubin. We've got the next 15 minutes of your time. We are going to educate you on a package we have in CRAN called HVT, new HVT stands for Hierarchical Veroni Tessellations. And we'll show you how we use that to create this sensing apparatus on complex systems. So what we're going to talk about the agenda is as follows, I'll spend five minutes with you kind of give you a zoom, a big zoom out why we're doing this. And then we'll introduce a gentleman named Senghi. He'll take you how we representing these complex systems, you know, so we can monitor them and sense their movement. And then we'll give you a live demo and a gentleman named Shubra will take you through that. Kind of the big picture what spawned a lot of this capability is the idea of, you know, enterprise moving in the thing called prescriptive analytics, you know, you have this DIPP framework, descriptive inquisitive, predictive and then prescriptive. That kind of gives a nice paradigm of a lot of the analytics that Fortune 500 companies are doing today. And we really feel it's a they're all, you know, playing notes they're all think of them as musical instruments and playing music of data science it's they're all interrelated they're all very important. But however we're seeing clients starting to ask us more difficult questions to answer. Beyond predictive, they're asking us questions like, you know, we have no data we want to build it we want to, you know, introduce a new product, or, you know, we want to change price in a very competitive environment. So when the competitor will do this, this and this, what do we think our outcome will be. So now we're getting the value weight policy changes we're getting into classic prescriptive analytics where simulation is the workhorse here. Okay, so given that environment what happens is that okay, you have to think about all right what kind of system are we actually intervening in. So, for example, here is a decision maker they had to make a decision, you know, go left or go right. So they're making decision now they feel much better. But you know, however, over time, this decision could have been incorrect right where this comeback to bite them. We call these unanticipated consequences. You know this is really when you're dealing with systems that you're using potentially methods and aren't designed for that type of the system. You're not kind of anticipating or learning about these types of feedback effects right. And you can get into positions like this and they could be lags here this could be a year two years three years, etc. Before you see these these negative externalities from this decision. So this is when you get into okay well you know what's going on here what's causing this stuff. Well, you know a lot of the data scientists today are very comfortable. And you know, and I think it's very deceiving. And what you have to really do is differentiate between two types of systems complicated or complex systems and most people assume with the methodologies or methods are using today. You're dealing with a complicated system where you can think of it as assembly line, and you have components that are interacting in somewhat predictable ways and a lot of our correlation based techniques that we're using here work pretty well, especially even work and help predict these systems. But when you get into prescriptive. Now we're actually making policy changes and moving love lovers. That assumes that there's causality now I'm moving this lever. I'm assuming my left hand side my wife arrows will be going to be effective. Now we're getting into causality. Okay, so it's a little different game, especially when you're dealing with a complex system. All right, let me define that complex system in the next slide. But I feel most of the systems that we work with the most likely complex systems and using understanding that really kind of changes your mindset on this on the on the types of techniques, you should be doing in the, and the actual paradigm to actually go after that system. Right. So here's examples of complex systems are actually they're everywhere. They're everywhere we see them in Bangalore traffic we see them in ants and how they self organized make bridges we see them when we look in the sky and see birds walking right these are all complex systems there's an agent, there's feedback there's no central coordinator, you know bottom line, a lot of the interventions we're doing. And in the analytics for prescriptive, you're actually working in a complex system, especially dealing with supply chains and customer behavior and things like that. Right. And so there's a whole science to this. There's a gentleman in Brian castling you create this also we can get a lot of inspiration from Brian on how to actually go about monitoring and control if possible of a complex system. And then I would recommend you check some of his workout so some of the big areas here are dynamic systems theory and networks and agent based modeling. The reason I'm bringing that up to you is when you start realizing that the bottom line you're dealing with an organism. Okay, it really that was the big aha for us is that you are dealing with an organism so how do you go about monitoring, how do you go about intervening how do you go up predicting descriptive inquisitive and predictive on an organism. So again, we get a, we get some inspiration from robotics sense plan and act. And so the bottom line is, is the following. You have to really get good at sensing. I mean that this is the number one thing you should need to be doing when you realize you have a complex system. A lot of times you have unanticipated consequences. It becomes very important, what's called, you know, looking for change points and outliers and novelties. There's a whole class of, or anomalies. There's a whole class of methodologies around that Shannon enterprise surprise real that's the information I want. And so we call that intelligent reporting. We're saying, okay, here's your reporting layer. You know, how do I make my rich reporting more intelligent and showing me exceptions and novelties where it was just, you know, just plain Jane business intelligence reporting. So you're basically really really the bottom line is you have to get good at sensing. You have to create the sensing apparatus, since this complex system. So you have to understand you have to understand states and trajectories and how to monitor and model all these cases that we're saying. So that's the bottom line, get really, really good at sensing, build the sensing apparatus, and then you can move to thinking and acting. All right, so that's, that's our game plan here and that's really the genesis behind this HPT package. All right, so now I'll introduce saying he will take you through a little more details or our point of view for this outcome. Thank you. Thank you for putting emphasis on the need for complexity sciences enterprises. Now we'll move forward with our agenda. That is, how do we represent a complex system? But before we move ahead with it, let's talk about why we need it across multiple domains and what are the major problems that happen for C in the past. One of them is organized simplicity, which involves only a few variables like relative pressure and temperature in thermodynamics or relative population versus time. These types of problems where the focus of the 19th and 20th century in disciplines like physics, chemistry, biology, etc. Then there's disorganized complexity, which involves billions or even trillions of variables. The key lies in our assumption that there's little interaction among variables. This is what allows us to take meaningful interpretation from them. And finally, the problem with organizing complexity, which involves a moderate to a large number of variables with strong non-linear interactions. This means that the variables cannot be meaningfully interpreted. These are problems which involves dealing simultaneously with the sizable number of factors which are interrelated into an organic whole. And this is what emergence is all about. This organic whole, which actually refers to the dynamic emergent behavior of the system over time, which was actually nicely put down by Varen Weber in 1948. Here's the person behind information theory and Salas entropy, obviously alongside Claude Salas. Now that we talked about emergence, let me tell you, multiple behavior emerges in a complex system. And these behaviors are divided into discrete cases based on structural similarity, which ends up giving a kind of representation of the entire complex system. And this above process that I was talking about can be repeated in a hierarchical manner to get a more microscopic view into the sub-behavior of the system and using our view edge with the package or cram. And this is achieved using techniques from our supervised learning, computational geometry and multi-dimensional scaling. So to get more understanding, a more intuitive idea behind this, my colleague Subra will take over and give you a very nice demo that we have built in-house. Thank you. Hello everyone. So I'm going to kick off with a shiny application that we built on top of the new HVT package at Mu Sigma Labs. Let's build your intuition about what this thing does. We'll start with the Taurus. So Taurus here is a three dimensional object. The dataset shown here has three columns x, y and z. Let's say you knew nothing about the dataset. You have three columns and you need to learn about it. So what you can do is run it through the application. The plot here shows you what the image looks like. So you can see here how we're able to recreate the image in a pretty good way while significantly preserving the structure. So these cells here are called tessellations and these points are known as centrites. You can use this information in a couple of ways. One is we are able to compress the high dimensional data and we can visualize it. Every single data point that was shown in the raw data maps into this area. These areas can be thought of as a spatial map where each region or state is represented by a capital or centroid. So let me show you another dataset here. So this is a computer dataset which has 10 columns and around 6,250 entries. So this is the projection of the dataset. And this is the heat map of features overlaid on top of the tessellation. So here we are overlaying the price feature. So you can see the points here that are closer together to each other are very similar and points that are very far away are different. Points here in bottom right corner are marked in blue that corresponds to a high price and points in the top left corner are yellowish signifying low prices. Let's move further with the steps involved in the algorithm. So we begin with the scaling of dataset. The distance computation in any quantization algorithm weighs each dimension equally. So to avoid distortion in the relative nearness of observations caused by unit of dimension, we unit normalize each dimension individually. Then we move ahead to vector quantize the dataset. Here you can see in the right top image a number line representing numbers from 1 to 7. Instead of representing everything, we choose a central value to represent the chunk. For example, to do this, numbers between 0 to 1.9 are represented as 1. Those between 2 to 3.9 are represented as 2 and so on. So talking in terms of a dataset having multiple columns, this algorithm allows the scattered data point to organize together. Further, using the vector quantization, we partition the observations in raw data into case segments based on the similarity in which each observation is mapped to a stable centroid. In a n dimensional space. The number of centroids here is user defined and can be set according to the need. Now we have the coordinates for these k clusters. In a n dimensional space. We use the salmon mapping to project these points to a space of lower dimensionality while preserving the structure. Salmon mapping aims to minimize the error function, which takes into consideration the distance between the points in original space and corresponding distance after projection. The minimization of this function is performed by involving iterative methods. Now, so that we have the centroids mapped to a 2D space, we proceed with the visualization using Voronite acceleration. The spatial mapping shown here is nature inspired. We took inspiration from leaves. If you see in the right image, the entire leaf area is divided into chunks by larger veins and further division of these chunks by smaller veins. Similarly, we have this multi-level spatial visualization in our tools. So there's a lot of interesting things you can do with this tool. You can use it for monitoring customer segments, generating recommendation algorithms. You can use this for macro and micro segmentation to unify that all together. It's just a very powerful tool. So I didn't do it just this year. There's a lot more things this application can do. Now Sangit will showcase how this application can be leveraged for a panel date. Thank you Subra for that intuitive understanding of the new SVT package using the wonderful Taurus demo. Now we will represent a complex system which in our case is the financial market and monitor them over time. For this demo, we will be using peer data from the US equity market which works on the principle of statistical arbitrage. So without any further ado, let's dive into the demo. So here, as you can see, we are starting off with some data pre-processing like ignoring unwanted columns, selecting dependent y variable, selecting a unique identifier column made up of event time which is at a 5 second tick level along with the name of our peer. Then we move ahead and select our time series column and its format followed by the panel column which is in fact our peer name. So now we are going to take a deep dive into a second level to introduce more granularity in our data. So now what we have here is the explorer tab which is built using B3 on top of signing and this is using our new SVT model object. So let me explain what we are trying to do here. First off, we will select the dependent variable to know where our profit reaches are. Secondly, I will select a peer to monitor it over time. So as you can see here, the total time window is of a day and at a 5 second level and ranging from 9 a.m. to 3 p.m. EST. So now that I have selected a peer here that is SPY and SDS, let's monitor. So as you can see, the centres are getting highlighted in certain regions. This shows that our selected peer is moving in between those regions and there is a start and there is an end. So this part that the peer raised for us can tell us how it entered profitable regions as sometimes even non-profitable ones. So let's get a peak of the self where our peer ended in followed by all the other peers that saw similar behaviour. So now I have selected my region where my peer ended in and as you can see, these are the observations of all the other peers that are present in that region. And here is the centroid table. We basically talks about the type and the information for all the peers in that particular region. So secondly, this plot, keep in mind, this is very important as it tells us about the major contributor for the behaviour of our peer. And as you can see, the Z score and the P value between those talks that made up our peer are causing this behaviour. Normally this new identity can be used in a wide spectrum of problems from discovering groups of similar customers based on spending patterns and understanding price sensitivity of customers to detecting anomalies in sensor driven industry where we can use new agility to find quality turbines to damage oil leaks. And many other monitoring use cases, we are not going to cover all of them here, but you are always referred to the VLN. So on a closing note, as you can see, we are only monitoring a complex system using historic trade data. But think about the possibilities once we can predict the next step for a particular peer, something that we are currently researching on. And remember, always go through the documentation. It really helps. Thank you.