 I'm gonna just give a few introductory comments and then hand most of the time over to Shoda. But I wanted to start out by talking a little bit more about what we do specifically at GX Labs. So we're part of a bank called State Street. It's what's known as a custody bank. So that's not well known to people not in the finance industry, but whenever you do a trade, all the information, the transactions need to be settled and custody banks typically do that. We also have a large arm that a lot of people know about called Global Advisors, which focuses on investment management and they sell a range of products including something called an exchange or an ETF, an exchange traded fund. Now at GX Labs, we are focused at the intersection of data science and portfolio risk. So big institutional investors are trying to look across their whole portfolios of many different types of asset classes and figure out what that risk is. So this helps them in allocation decisions, helps them in risk management. So that's broadly what we tried to do. And the last time I was here, I talked a little bit about the theory and methodology behind what it is that we're trying to get at. Today we're gonna take a slightly different tack and a lot of this is to hopefully spark some interest on the part of some of you in the audience around the technology and some of the approaches that we're taking to get at this risk problem. Now we use Monte Carlo simulation, we use stochastic calculus, we use a number of statistical tools that are common in other disciplines and we found it very fruitful to cross disciplinary discussion. And then a new area which we hope to start, we haven't started this yet, but we'll probably start in the academic arena is around machine learning. So how can we use some of these new techniques to do things like identify systemic risk regimes in the economy generally or to identify the macroeconomic regime that we might be in or even to do things like fraud detection on a trading floor. So there's a number of interesting applications coming out of the machine learning and what's often called deep learning community that we find quite interesting. So that's just a quick introduction. So I'm gonna turn the time over to Shoda, he's gonna present some of our thinking and development specifically in the technology architecture and then hopefully we'll have enough time for some interesting Q and A. So hi, I'm Shoda. So I will be presenting the overall project that we have at GX Labs. So this is a multifaceted project, it really involves putting together machinery that can deal with a lot of the issues that institutional asset managers and people that manage money generally are facing today. And I'm gonna start off with a very quick sort of background and motivation. But the financial motivation or the sort of business motivation behind this is actually quite interesting because with most of the developed economies, having created welfare states after World War II and with the aging population that you see, there's these huge piles of pension money that needs to be somehow managed and that money needs to be matched to the liabilities of the pension that you've promised you're gonna pay out. So this creates a asset liability sort of matching or management problem. And this is gonna get worse as the demographic profile gets worse. Another motivation is the fact that for the last 30 years, interest rates have been going down and now they might come back up or the Fed started raising them in December of last year. So we are entering into regimes that we haven't really experienced for a long time. And this means that as interest rates go up, and there's a lot of volatility in the world, the world looks very different today than it did 30 years ago. 30 years ago, most of the world economy was these OECD economies, right? They've developed market economies. But with the rise of China with a lot of things happening in the emerging markets, again, not a lot of data where we can sort of rely on past experience to understand how to manage our portfolios. So these types of things are motivating the project overall at GX Labs. So we have clients and we help our clients think about these issues. And instead of just talking about one asset class, so just equities or fixed income, we help clients look at a range of asset classes and put them together in a way that matches what we call their risk appetite. So they're tolerance for taking on certain risks in order to match those pension liabilities in order to give the payouts that's desired in minimizing the risks. But the issue has been that most of quantitative finance has been sort of predicated on a lot of strong assumptions around things such as normality of distributions. Just if you look at these equations, they're just mostly linear regressions. And that's really part of the fact that the machinery available to deal with these things arose from the 1950s through the 1990s. So basically you had... And the other thing is data limitations. So the best data you had was mostly US equity. So most of the papers that you read just mostly based on US equity data, a large portion of it. And so these sort of informed the limitations around the traditional approaches to modeling these types of risks and asset returns. And so we go through this and you realize that back then people had mainframes or worse in the 50s. So if you think about, if you match up when for example, Harry Markowitz was coming up with modern portfolio theory in the 50s or Black Scholes Merton was coming up with stuff in the 70s, basically these are the types of computers that they had to deal with. Very limited data, very limited computing capability. So part of the project that we have here at GX Labs is basically to take advantage of all the new data that's becoming more readily available as well as to take advantage of the computing power that's much cheaper and faster and better these days. And so if you look at things that we characterize as financial assets, so again these could be stocks or bonds, FX, commodities, what have you. So I'm just gonna generally call them assets. So if you look at asset returns, a lot of this literature is predicated on the fact that for simplifying reasons they have normal or log normal return distribution. So the normal distribution is this blue. Now this is just a stylized picture but if you look at just equity returns, you'll find that they often have long tails and you could try to fit some distribution on it that give you those long tails but do we have enough data to really characterize what they look like? Is it really IID? So these are questions that pop up and as you look and look further and further, we get into issues that if we understand better can characterize the asset returns better. So for example, over what periods? So there might be momentum over very short periods of time but over longer periods is it really sort of a Brownian motion or absence of autocorrelations? Do we get heavy tails, conditional heavy tails? So even if we correct for regimes we may get heavy long tails even corrected for that. We might get evidence of things like volatility clustering or intermittency which is just bursts of stuff happening discontinuously. So how do you begin to understand model and incorporate that and the behavior of these asset returns in the construction of the portfolios? And you can't really start to do that until you have the appropriate data. And then coming to our rescue is what I'm generally gonna call high performance computing. So people talk about machine learning, people talk about all sorts of things but high performance computing, I'm just gonna sort of group everything together. So it's all this stuff we love, Spark and just really fast paralyzed computers. And a lot of people talk about Moore's Law. So this is a set of slides I've grabbed for a general presentation but I think Moore's Law everyone knows. So I thought this was a little bit easier to understand. So the cost of computing power equal to an iPad too. So if you buy an iPad too, these are sort of orders and magnitudes but in 2010 dollars it's about say $100 but back in the 1950 it would have cost a trillion dollars for the same computing power. So this is enabling new types of things where we're not forced to stick to those linear regression equations. Now we're still gonna use a lot of linear math, linear algebra to delve into some of these topics because that's the machinery we have readily available. But with this new machinery, we're gonna start to explore other types of relationships that can be a little bit different. And as you probably know, a catalyst for changing approaches, we have just a lot more data. So images, sounds, moving images, streaming, all sorts of stuff, mobile. So everyone's on mobile and that's what they use. That's generating a lot of data. Internet of things. So that's probably not quite here yet but people talk about it a lot. But devices embedded in every single thing that's doing something so that you can get the data. Part of the limitation of asset management and economic modeling has been that basically things like GDP is just a very gross measure of what's going on. It's a lagging indicator and there's huge estimation error in this. What's the total economic activity in Brazil? Anyone's guess. But with these types of things if you can start to measure the granular activity of every single transaction that's occurring in the economy you start to get to a better picture of what's going on. So the data and the new types of databases. So whether it's sort of a graph database or these no SQL databases, there's new types of databases that are much better fit for exploring the types of issues we have. The infrastructure and culture. So cloud-based computing. So basically things from Amazon Web Services, Azure, Google Cloud Platform. So these are things that allow anyone to just sort of pop up in a computer and open an account and start computing very quickly. This is changing a lot of the mindset around how we can explore these issues as well as the culture of open source, these communities that allow us to bootstrap very quickly around very difficult problems. And I can't emphasize that enough because coming from a mainstream financial institution this stuff, everyone's used to this stuff on the right here. It's like I'm an IT guy at XYZ Bank. I know Oracle, I work with Microsoft. And then here are these other people from some alien planet that are working on things like Spark, using mesos and NVIDIA GPUs. So it's a totally different world. And there's a huge gap and the people over here don't know how to even hire the people who have the right expertise. And I'm making a very gross generalization but for the most part it's true. And so this gets you into a sort of technological barrier where how do you get the right human resources in these mainstream large banks to actually appropriate this type of technology? And we're facing a very big problem even in our institution where basically we're having huge hurdles due to compliance or what have you because this stuff doesn't fit the mold of everything that's been developed on the basis of this technology. And not to mention the culture. So open source, everything is actually not very well looked upon in a very compliance heavy, risk excellent environment. And banking is obviously very heavily regulated and we'll even get more so. And so people get very scared of oh, it's open source. You guys are gonna talk to some community. So if you wanna put like, if you wanna use GitHub it's kind of frowned upon, right? And that slows you down, it slows down innovation. So that's a material issue. And then of course the computation, I'm not gonna talk about it, but basically you have new faster GPUs and things like that, as well as the rise of machine learning. So everything from deep neural nets to more sort of traditional statistical approaches to doing machine learning. So and what's interesting is that most of the time people have been using sort of these regressions which have been developed and which is great but there's new ways to learn and perhaps how do we deal with unsupervised data? And these are topics that are very, very ill explored. And one of the interesting things that I noticed is that most of finance and most of the people that know about things like machine learning they wanna look for alpha. Alpha seeking is basically finding extra money that shouldn't be there if the markets are truly efficient. And so that's what hedge funds do. They look for alpha, they make money. And what we're trying to do is slightly different. We're trying to actually put together different asset classes and different securities to change the shape of the profile or the distribution of your future returns. The reason we wanna do that is we wanna help our clients, these institutions that need to pinch in money. If you're a sovereign fund, you have a lot of oil, you need to convert that oil money into money that's a little bit more sustainable and diversified than just oil. So you're gonna invest in different asset classes. So how do you put that all together? In a world that's extremely volatile, unpredictable and very different from say 30 years ago and make sense of it. And so this is really the sort of the problem that we're dealing with. So what we do is we're gonna try and we're trying to put all of these technologies together to do a portfolio analysis. So I'm gonna get into this discussion around how do you simulate a portfolio using high performance computing environment. So a lot of the techniques that we're gonna be talking about are actually pretty old school stuff. It's been around for a while but we're gonna sort of boost it up by putting it on this high performance computing environment. So a lot of the things that you'll see, it'll be familiar and it'll be fairly standard techniques. But we will be using things like Monte Carlo simulations, factor modeling. So if you're in finance, these are sort of standard bread and butter approaches. But let me sort of posit the issue. So here's your normal distribution. I told you equities have these long tail distributions. And then when you put just not equities but also fixed income, these are bonds, commodities, what have you. So different asset classes. When you have a multi-asset class portfolio, which if you're a serious institutional investor, you're gonna have a multi-asset class portfolio. You're not just gonna have a stock portfolio. So when you put these together, you get these skewed long tail distributions. And how do you characterize that? Does that distribution have a name? We don't know. So we're gonna have some sort of generative process that's less parametric in order to understand how these things look like. And to do that, we're gonna use a generative process that involves a factor model. So we're gonna look at everything that's happened in terms of the returns of all the securities we're interested in. And we're gonna do some dimensionality reduction using factor model. And we're gonna use that as a driver to figure out how they move together and then simulate in a generative process a return distribution out in the future. And we're gonna look at that characteristic to derive what the risk is, what the anticipated tail risk is, maybe even the returns. So just to get you familiar with a little bit of the terminology. Sorry, it's a little bit wordy, but just look at the picture. So you have this distribution. So in finance, typically a lot of people worry about risk. Here they had a lot of risk in return. And when most people talk about risk, they're really just referring to the one standard deviation of this distribution. So they'll typically call it just volatility. So if you go buy a mutual fund or if you buy some stocks and basically people talk about risk and then return. And the risk is this volatility. And then the issue is that in, for example, a pension fund again, you're worried that you're gonna suffer some catastrophic loss. That might be very, very rare. So these are what we characterize as tail losses. And what's the tail? Well, the tail is a little bit of a subjective notion. Can I handle a really big hit or really, really, really big hit? So it's a little bit subjective. And you can set the start of the tail to be whatever tolerance. So let's say that I'm willing to take on a loss event that's gonna hit me with some impact of X dollars, right? Once every 33 years, you've quantified your sort of risk tolerance around the tail. And so this yellowish region here is defining your tail loss. And so some of you may have heard of the notion of value at risk. So the value at risk is sort of how much you lose at the tail there. What we tend to focus on is what's the average loss you will suffer conditional on the fact that you've hit the tail. So, you know, the world has gone into a bad state and then conditional on that, you know, what's your expected value of losses? So that's the expected tail loss or ETL, right? So we're gonna focus on that. And this is actually even, you know, it's often quite common to think about these things in banking, but in asset management, it's actually quite new. People don't think about this because they're so focused around volatility. And also, most of, again, asset management's been predicated on thinking about things in terms of just equity. So in equity, you don't worry so much about these very, very long tails. And so as you get into the multi-asset class context, this becomes much more pertinent. And so in order to generate observations in a simulation where you're hitting the tail, you're hitting states of the world in your simulation, where you're hitting the tail, right? These are rare events. So if I'm looking at this, you know, one in 33 year event, then in order to get, say, 10,000 dots in here in the tail region, I need to run the simulation a million times, right? And that may or may not sound like a lot, but typically if you look at equity simulations, people have run them 1,000 or 10,000 times, right? And those are the tools, yeah. Yeah, so you can do important sampling. So that's a way to speed things up and to get more resolution in the tail. So I'm just talking very broadly here, right? So yeah, so we basically need to do a lot of iterations, more than that have been built into typical risk management tools that you can buy off the shelf. And so, you know, computing speed becomes fairly key, right? So what are we gonna do? What exactly is this process? This portfolio simulation process is actually fairly simple. I characterize it in two big chunks, right? Chunk one is there's this factor model and a factor model's just, you know, basically taking all the market price returns, so you have your equity returns or your bond returns and you break them down into things that are simpler. So instead of looking at, you know, 100,000 by 100,000 securities and looking at how they all move together at once, you're gonna do some dimensionality reduction and describe them in terms of, say, 10 factors, right? And then, so the 10 factors explain most of what we call the systematic drivers or systematic risk factors. And then, whatever you can explain using those is the residual, it's the idiosyncratic component that's specific to the security, right? Oh, I'm not familiar with that, but basically, right now, this particular machinery that we're building is basically looking at straightforward PCA, but we're gonna do a little bit of engineering around the parameters on the edges as well as, once we have a little bit more time and resources, we'll look at different methodologies. And I think, obviously, PCA is not the only way to do this. Lisa's been working on this, she has something to say. Yeah, I think there's a zoo of possibilities of what to do around low-rank matrix approximation. So, I mean, PCA is kind of a starting point and different varieties of that PCA you can do, but obviously, even within PCA you could do sparse PCA, kernel PCA, CUR decomposition, whatever you want. So, I think what the machinery, the advantage of building out the machinery on this technical sort of sandbox is the ability for us to test this and play around with it very quickly, which wasn't really possible before, right? So, we do this and then, we actually, so what we're doing is we're getting the coefficients or what we call the factor sensitivities or loadings. So, if I have a stock, right? The stock returns can be explained by how sensitive I am to the 10 systematic factors plus what Esver's left over. And the coefficients for those 10 systematic factors are my factor loadings or those betas, okay? And that defines my portfolio. I'm gonna stick that into my Monte Carlo simulation and I'm gonna have some random generative process. So, I have a pseudo-random number generator that's drawing random factor outcomes that represent states of the world or states of the factors. And I'm gonna do that to generate a portfolio value conditional on that state of the world, a conditional on those factor outcomes. And then I can repeat that, you know, say a million times to get a portfolio distribution. So, again, fairly straightforward and fairly, you know, I mean, not trivial but, you know, not tremendously difficult either, but there's a lot of details to get around right around the factor modeling and the simulations around, you know, even the random number generator, you know, getting the periodicity, yes. That, well, okay, so how do you know this is working? So, you need to test both the factor model, right? And whether the factors are giving you, giving you performance that makes sense and there's a certain number of statistics. So, okay. So, okay, let me, let me, so you're gonna test this and then you're gonna test the whole thing as well. Meaning the output out here. So, first thing first is you're gonna test the factor model. So, you wanna make sure that, let's say that you are, you have, you've estimated your factor model, right? And then you could sort of come up with a portfolio and look at out of sample. So, let's say you create an equity portfolio and you see whether that's explaining the risks without, you know, things like bias or, you know, that you're getting things right most of the time. So, there's certain statistics that we can focus on and such as Q. Well, the other thing is the world, if the world I think we're static forever, meaning that we don't go through regime shifts and things like that, it's easier to see and it's easier to test in the way that you say, right? But the problem is that, you know, if, for example, the rates haven't come up in the last 30 years and suddenly you have this regime where the Fed is raising interest rates. So, no one's ever seen that before in the data that we've seen, for example, right? So, how valid is that in that context? So, you get into very sort of difficult practical issues around back testing this. The other thing is that you wanna look at, you know, you wanna focus a little bit around the tail events and the tail events happen very, very rarely. So, it gets very, very difficult to back test out of sample as well. Right. Yeah, so the, this is just presenting the overall picture. I'll actually get into the mechanics of it a little bit. So, yeah, so typical sort of factor model. So, you know, again, mainstream finances is basically predicated on, you know, very simple linear regression. So, you know, things like this single factor model is sort of the canonical version where basically, you know, whatever security you have can be explained by this thing called the market and how sensitive this thing is to the market plus, you know, plus some residual. And then you have sort of multi-factor models that gives you more than just one market, you have different explanatory factors and that's where the factor model comes from. So, for example, in equities, you know, if you go to any financial website, you know, people will talk about, oh, it's, you know, the value or growth or the momentum factor, the size factor, these are the types of things that are purported to explain equity returns, right? And in fixed income, you have other factors such as a level slope curvature. Now, as a practical matter, part of the problem is that, you know, the equity people have their own factor models and then the fixed income people have their own factor models and they run their own portfolios. But at the top of the house, right, these big institutions, there's very few places that bring them all together in a coherent framework so that you can look at equities, fixed income, you know, credit, all sorts of asset classes and all the risks and drivers in a coherent way. So that's one of the things that we're working on is a multi-asset class approach that can explain a lot of these things, equity, fixed income, in multi-asset class context. And part of the problem of these traditional sort of regression models is that you get these, or the fundamental factor models, is that you get issues of collinearity. So there's some, you know, overlap between the factors. You typically have fairly low explanatory power. And this depends on how you construct it, but it's often not reflective of the macroeconomic regime. So we need to sort of adjust for that. So again, what we do is a fairly straightforward PCA. This is still research in progress. I think we're working on, you know, we're starting with a very basic PCA approach, but one of the things we're starting with is with this high performance computing and we can actually run this very quickly so we can adjust the parameters and do backtesting and all sorts of fun things to get comfort around both the dataset that we're looking at and the results. So we can change sort of the cutoff for the number of principal components that we consider to be systematic. We can change the sampling frequency. So do you take the sampling weekly, daily, monthly? We can change the sort of look back window, the estimation period. We can change the temporal weighting. So, you know, do you have a half-life? That's about 50% of this window, or is it gonna be 80%? And then you have some cross-sectional weighting. So, you know, maybe instead of just equally weighting all of the securities before you run the PCA, you may want to weight it such that maybe the more liquid, whatever you consider to be the more liquid names are counted more. You know, so we seem to be converging on the idea that for many of our client's portfolios, you know, the square root of market cap, for example, might be an appropriate cross-sectional weighting. And we also might want to do some regime conditioning. So like I said, the world changes fairly quickly, and you go through these different regimes, volatility clustering. So we may want to identify regimes where the rules change and condition the behavior based on the regimes that you can identify. So some of the desirable model features, you want, you know, I'm just gonna read this, accurate, stable, transparent, factor-based risk forecast. So this is actually just a slide from our colleague, Lisa. Who just spoke. And we want it to be multi-acid class, consistent treatment at different time horizons, both short-term and long-term. So there's sort of a difference between people that are managing money every day, and they want to make money a quick buck every day, or every week, versus pension funds that are looking at 10, 20, 30 years out. Consistent treatment of acid liabilities, more focused on the generation of an accurate tail risk picture, because that's what will kill you. So these are types of things that we want to build in to the model. So, pardon? When you say attribution problem. Oh, I see. Right. Attribution is you, so attribution here, and the way we've used it here, is sort of a not-afford-looking idea, but a backward-looking idea. So I've realized some performance over some period. Okay, and given that performance, and given the assets that I've invested in, can I attribute this to, say, the factors that I've identified? So I can say most of the return was caused by these 10 factors, and then there's some portion that I can't explain with those 10 factors. So I'm looking for a causal effect. So that's a sort of backward look. It's slightly different problem than what I've been talking about, but it does enter into the picture, because you have to create reports for shareholders, or whoever it is, stakeholders, to explain why you've done what you've done. Yeah. Yeah, yeah. Exactly. So in fact, this is a point where I find it's even difficult to explain, even inside our company, because people mix up this attribution problem with this forward-looking problem, and I think that there's a lot of, I don't wanna say confusion, but there's a lot of explaining to do to a lot of people to explain the nuance there. Yes. So again, just a rehash. So in our latent factor estimation, so we have some temporal weighting, some look-back window, and this can be weighted in different ways across time and cross-sectionally as well, and then we'll do some dimensionality reduction. So right now using PCA, we can infuse it with some regime information. So for example, in equities, there's a thing called the VIX. The VIX is just a forward-looking implied volatility, and it's implied from the market. So the markets are telling you, if that's the correct level of, say, the S&P, then over the next three months, the implied volatility in the markets is gonna be this level, called the VIX, and so we can condition it on the VIX, and then maybe the returns look a little bit more normal than if it weren't conditioned, and then we get the latent factor. So right now we're starting off just very simply trying to characterize equities, fixed income, credit, which is a slightly different animal. They're often packaged together because often a bond has both a rate component, which is basically the discount factor, so the time value of money, which is a rate issue, and then the fact that someone cannot pay you back as they default, which is a credit problem. So those are packaged in bonds, but we need to address both, and then there's the notion of foreign exchange movements. So if I'm a dollar investor, I'm sitting here investing in a dollar in say a Japanese bond, and the relative movements between the dollar and the yen are gonna impact my returns. So I need to think about that. There's a number of other things that we'd wanna cover, but in order to create a multi-acid class portfolio, you'd wanna address these in a fairly coherent manner. So I don't think we have an empirical answer for that yet. I think the regimes can shift very, very quickly, and so it makes, so for example, you can use the VIX as almost like a state variable, and then say, okay, if the VIX is, or if the change in VIX is above a certain point and just condition it on that, and that's one way to do it. But you also find sort of broader themes and broader changes, which can be a little bit slower in changing, and so that's where I think some of the machine learning can come into play in detecting some of these regimes, because part of the problem is that it's hard for me as a human that has limited experience to identify and pick the right indicator for what the regime is. So can we let the data tell us how these things cluster better? And I think that's an open question. These are Bayesian, these can be me, right? And then, so at this point I've sort of generated what we call the factor returns, right? And these are telling you how the factors move around. And then I go back and I have a second process where I regress all the securities that I'm interested in back onto the factor returns to see their sensitivities to the factor returns, right? So now we get into the simulation bit where basically, so this little representation I have here is just imagine one position, right? One stock or one bond, and I'm trying to value it, but it's a conditional valuation dependent on a state of the world, and the state of the world is represented by these random factor draws. And the blue bars there are the factor loadings or the sensitivities of this particular security to those factors. So I'm gonna do a random factor draw, right? And that represents a state of the world. And based on my sensitivity to those factors, there's some transformation that happens to do a valuation and convert that into say a price or value or something for that particular stock or bond, right? So that's what we call the conditional valuation, okay? And then, and so that's for one position, but I need to do a valuation for all my different positions. So I'm gonna have different pricing models or different valuation models for every single type of asset I have. So stocks are gonna have their own valuation models, bonds are gonna have different ones, et cetera, et cetera. And we can, part of the challenge for us as a business is actually to cover all the different types of things there are. So there can be derivatives, which are relatively simple, but you go into things like real estate and more complex securitization. So we have to create valuation models for each one of these. And we do it for the entire portfolio to get one portfolio value conditional on one state of the world. And then of course you're gonna draw different states of the world. So you're gonna repeat this many, many times. So say a million times in order to generate this overall portfolio distribution. And again, depending on how we condition it or not, it's gonna give you different shapes. I think the point is that we really don't know what the right answer is. We need to do a lot of back testing. It's hard to do back testing, but having again the machinery in the sandbox that allows us to change the parameters very quickly and run the simulations more quickly allows us to have a very empirical approach, which really hasn't been common enough, I think, in finance. So a lot of people just write equations on the board and then make assumptions and then sort of follow that logic through. So this machinery allows us to sort of look at the data and then transform it into this distribution in a way that we can explore without waiting months. And so, but you still get into issues where you have to deal with a lot of numbers, right? So if you have a portfolio that's fairly large, so let's say 100,000 positions, right? And you have an order of 10 factors that are explaining. And then you're doing a million simulation. You get into manipulating fairly large number of things. So how do you do that? How do you get that efficiently in and out of memory? How do you store that efficiently? Do we store this for every single state of every single security? So these are the types of practical issues we get into in the implementation. And so once you have the distribution, like I said, there's a number of typical risk metrics. Volatility, I just talked about before, the value at risk, which is the loss at the tail or the expected tail loss, which is just the the the expectation value of the portion in the tail. So you can look at these types of things. And there's a lot of education that needs to happen around what the appropriate risk measure is, depending on what your goal is. So how fast can we speed this up? Just a proof of concept. So we just did something very, very simple, but we took I think 100,000 bonds. Yeah, so 100,000 bonds and we did a million simulations. And we just took a sort of one GPU or one CPU and GPU, right? Single threaded and we saw how long it took. And so on the CPU, you know, it would have taken 900 hours. And then on the GPU, so this Tesla here, you know, it took 3.6 hours. So a 250 times speed up relative to the CPU. So this is just the type of thing that allows us to have some confidence that we could do these things much more quickly. You know, and we're right now in the process of building out an infrastructure that allows us to do this. So this is actually just a diagram of the architecture. But you know, we have our usual data ETL. So all the market fields, Bloomberg, Reuters, all of these types of things coming in on the left. And then we have a distributed data storage. So I think we're building out four Cassandra nodes. Was it four? Three, yeah, to start with. To house all of this. And then we're gonna sort of stream or put that into Spark. And that will be sort of our main computation machinery. We're gonna have the simulator, the factor modeling, all of that on Spark. And we're gonna hopefully have some GPU acceleration. And that GPU will provide us with a facility also to do more exploration around machine learning and deep learning if we want to and things like that. So this is the type of thing that we're building out today. We're also creating a user interface. So in finance, most of the reports you get are these ugly Excel sheets and little bar charts. And we've focused a little bit around how to create better visualizations. These are screenshots, but I can show you just a little bit. So these are the types of tools that finance people use, except typically they're a lot uglier, I think. But we focused a little bit around the aesthetic component. So you have different portfolio sets. I can choose, let's say, the short-term portfolio. And so this portfolio is a little bit stale. It's a December, 2014. It's about a billion dollar portfolio with 17,000 positions. I can look at it. Okay, so this is just a cross-sectional exploration. So it looks like it's about 60% fixed income, 40% equities, mostly a US portfolio. This is by dollar value. But if I change the access, so I make bubble size, for example, the volatility. So back then, Greece pops up as a risky name. So these are some of the visualizations I can go into looking at return distributions. And I've run a portfolio. This is the baseline here. And compared to a situation where, let's say there's a 30% drop in equities. How does the distribution move? Or if there's a 2% rise in interest rates, what happens? So we can start to explore these things very quickly. And then you can take a lot of data and dense data and represent them fairly efficiently. So things like, this is a risk return graph. So all of these dots are individual positions. And here I've actually only represented the top 5,000. So I could just look at the top 250, right? Focus on those, and then I can go to the top 1,000, top 5,000, right? And I can look at, say, just the equities. What does that look like? Right, and then I can color in by, say, volatility on a contribution to the portfolio basis, right? So who is this name here? It's Hewlett Packard. This big red dot, who is that? It's Apple, right? So Apple's probably a good name, but I just have too much of it. So these are things that are allowing us to do sort of data exploration on the fly. In ways that are hopefully a little bit more compelling than just Excel reports. And this is actually quite a big, I think, problem in finance, is that you sit in a risk committee, there's a bunch of guys, typically they're guys, and they come with these reports, and no one really knows how to understand the 30 charts that are on it. And so this is much more interactive, and you could say, okay, what if oil price drops? Right, so I think, it's kind of fun, but it's actually quite important, and germane to making better risk management practice in finance. So, sorry for the interlude. So just to give you an example. So we have, so we've constructed a proxy portfolio, so these are sort of toy portfolios. I can give you sort of some more details, but just at a high level, we've created five portfolios that get bigger and bigger. So the smallest one, portfolio one, is just the equities minus all financial firms and utilities. So financials and utilities have their own characteristics. They're not typical to, they're not typical. So we've sort of exploited them for the first portfolio. Then you add those financials and utilities, so all equities. The third one is all equities and OECD bonds. So OECD bonds means basically, develop market government bonds. And then four is all equities and all sovereign bonds. And then five is all equities, all sovereign bonds, and corporate credit. So you put in the default risk. And so I'm not gonna go through the details here, but those are the five things. And we can run measures of expected return, volatility, expected tail loss. This is sort of what I wanna focus on is the distribution. So this is the portfolio one. This is as of June 2008. And if you remember June, or if you remember 2008, that was sort of a marquee year in finance because everything stopped working and people lost a lot of money. Okay. But this is sort of before the collapse of Lehman Brothers and things like that and getting into sort of the bad zone. But this is the equities, X utilities, X financials, right, return distribution. If I add in just all the equities, so that's the green line. Now if I add in bonds, so just OECD bonds, that's the purple line. If I add in all equities and all sovereign bonds, that's the blue line. And then if I add in corporate credit, it's that red line. Okay, so what do I see? I see the distribution just sort of tightening up, right? So it looks like diversification is working, right? And again, it's created out of this generative process from the simulation, but I get to sort of see in a multi-asset class context for the first time, it's coherent across all the different asset classes and I see them together and I see that somehow diversification is helping my portfolio out. And then what happens, this is June as I get into say December, right? You'll notice that, boom, the expected difference goes down, the distributions are much wider, you've probably lost a lot of money. Because in September, Lehman went broke and here we are. But if you had had a more diversified portfolio, you would have had a tighter distribution, lower risk and less expected losses. So we're running a one-year time horizon. Okay. Pardon? Yeah, yeah, if you were confident that this worked. Yeah. But again, it's kind of a, and you'll see it's a zigzaggy so we haven't run as many simulations as we would have liked. So probably the confidence intervals aren't as great as we want, but once we have the spark up and running, we could run millions of simulations, change all the parameterizations on the factor model, rerun them. I mean that's really what we're going for. So we're not ideological about sort of the particular settings where we just want to have this playground where we can test this out and then use that to explore the type of portfolio that we're managing. So that's really sort of the project so far, but there's a number of sort of open questions. So in terms of how do you measure the performance, what are the right, how much data do we need to actually characterize these things to a level that we're confident or comfortable with, how do we add in more asset classes and things like that. And then that's enough to sort of do a business around this, but we want to go even further, right? So what are the other possibilities? So this is, I mean, big data and HPC and finance, so high performance computing and finance, so this is sort of a marketing slide, but people like it because if you think about the possibilities, so like I said, going back to the notion of getting granular economic transactions, right, you can track oil tanker. So there's these hedge funds out there that are measuring the water displacement of oil tankers as it leaves the Gulf of Hormuz, you know, the Persian Gulf and then like passes through the Malacca Straits. And of course the displacement is less as it passes through the Malacca Straits because it's sold a lot of its oil. And so if you could track all these oil tankers in real time, it's actually fairly good proxy for how much economic activity there is, right? So fairly clever. You know, you could get data from the web, e-commerce sites, social networks, social graphs, a bunch of stuff, you know, these IoT sensors, if you can somehow talk everyone into... Yeah. That's right. Yeah, so, you know, there's a lot of these different sources of data. So until, you know, even fairly recently, five years ago, you know, all you could buy from Reuters is, you know, financial statements and, you know, equity prices and things like that. And now you can source these different types of data. And hopefully, you know, it has information content that's quite different from orthogonal to everything else that you've been looking at. So, you know, you can look at that smart grid. So the reason I had all those windmills in the very first slide was because, you know, I was thinking of windmills and smart grids and things like that. But if you can get, if you can harness all that data together and somehow, you know, sort of coherently put that together, there's already certain advances we've made. So you've mentioned the sort of the Bloomberg's already selling this data. At State Street, there's this effort called the... It's in conjunction with MIT. And it's something called price stats. And what it does is it scrapes the web. I think there's about 22 countries. And it scrapes the web for prices, online prices. So you go to Alibaba or eBay or what have you. And then that gives you a daily high frequency inflation index. So the normal inflation numbers are announced monthly or quarterly or whatever by the central banks typically. So that's this sort of blue line, the CPI there. And then you see that price stats is sometimes a leading indicator, right? So you can use these types of things. If you managed to strike partnerships with providers, you can get transaction information from all sorts of vendors and transaction partners. So this is just another example. But the point is that there's just all sorts of data that's becoming available or sometimes not available. But most of mainstream finance is not yet looking at these things. They're talking about it. But there's a lot of questions about how to actually integrate this into a coherent implementation to help your own portfolios out or risk management out. And again, part of the divide is the fact that a lot of the people that have this, the knowledge of how to deal with this. So people with expertise in Spark and things like that, they're not hired into the mainstream finance IT departments or the risk management departments. So using such data, we can potentially construct granular high-frequency macroeconomic indicators. And once you have that sort of data, then it becomes interesting to throw some, say machine learning, deep learning, whatever pattern recognition algorithms may make sense to do things like regime detection, right? How do these things cluster? And you can take the dimensionality out much higher and do it in finite time. So I think that's quite interesting. And what we ultimately wanna do as a business, not the only thing, but just imagine that you could do all this very quickly. You can then take this simulation out. So you have some macroeconomic scenario that you're forecasting, and then you could do a path dependent, multi-period Monte Carlo simulation, right? So then you could sort of take it out further. And remember, this again gets into a huge computational problem because I'm keeping track of, say, a trillion numbers in just one period of simulation, right? But from each one of these points, I'm generating yet another million states of the world or another million points in the distribution. And so I get this explosion of data that I need to keep track of, so just even just simulated data. So this is where some of the high-performance computing infrastructure gets very interesting. And then on top of that, imagine that I'm running a pension fund, so it's not just the portfolio values that I need to keep track of. I'm gonna have two distributions that I need to keep track of. So what's the distribution of my portfolio values and also my liabilities? And the liabilities are often driven by the same macroeconomic factors as the portfolio. So how do I keep track of that? And how much money do I need to keep so that I can retire safely when I'm like whatever age, 80 probably, when they pay out my pension? So these are the types of problems that we would like to approach. Another interesting potential application of machine learning or some of the new techniques that we're exploring is something that we're calling reverse stress testing. So the Fed always are pushing the banks to do things called stress testing where if the world goes down, what happens to your portfolio? We can actually run the process in reverse. So starting with today's distribution, the current portfolio distribution, and you pick your tail interval, what are the factor realizations that can potentially land you in a bad state of the world? And then so you sort of search through this grid of factor realization space. And then maybe you can apply some clustering and figure out that statistically when a lot of these red things are clustering together, it looks like, boom, on the macroeconomic variable side, oil prices come down. So this is actually not a trivial problem. There's a lot of issues that are slippery around trying to map these macro variables to the factors, the PCA factors that we've derived. And those relationships are non-linear and you don't know if they're stable. So there's a lot of issues. We'd love your help in thinking about this. But overall, the point is that there's a huge opportunity in finance, both afforded by the explosion in data, the availability of computing power, and some of the new techniques that we're developing to actually sort of solve these multi-asset class, long period problems with a much more of an empirical approach than was possible before. So, so that's it.