 Hello, all right Technically as you solve maybe maybe not we'll see if not. I'll just wing it. All right guys. Hello My name is Gene Xter. I'm going to talk to you about something called Alternative data No, yes, turn of data on Wall Street. So Before I start before I start I'd like to Say thank you to the organizers and the sponsors of this event Because I think Without such events the entire data ecosystem would not be growing at the rate it is today So thank you so much organizers. Let's give them a hand. Okay so Zyna mentioned Monetizing data and this is exactly what we're going to talk about right now how to Take data and make money out of it You know GPU CPUs at the end there's money right and So I have actually spent about nine years in this world of alternative data Which in a lot of industries would be just a rookie I would be just a rookie but in this industry considered a veteran which tells you how new this industry is So there's a lot of opportunity. There's a lot of learning and and let's first learn about what it is so Let's start with an an investor. You are an investor. You're investing into let's say a hotel company You have data that comes from things like Bloomberg Okay, and this data is available to everybody this data. What am I talking about? It's stock prices It's company earnings. It's all the regular stuff that if you're familiar with Investing it's all the stuff that you an investor is used for for many many years so alternative data is really defined as Everything all the data sets that feed that investment process that is not Standard that does not come from Bloomberg that is something different. It's a bit of a self-defeating Definition because one day it is going to be the standard but today it's not so for now It's alternative and we'll talk about maybe how that will change Okay, so I think the best way to To talk about alternative data is to start with an example Let's pick on the lodging industry hotels as I mentioned How many of you have stayed in a hotel in the last six months raise your hand Okay Great. Have you ever in a hotel? Have you ever wondered how many other people are staying in this hotel tonight? Well investors wonder this all the time If I'm an investor in a hotel, I really care about I mean I care about a lot of things But I care about two things That that more than all the other stuff one of them is the vacancy rate Okay, we'll figure this out Yeah So I'll talk through this but basically one of the things is the vacancy rate The other thing is the room rate and you put these two things together in a hotel the vacancy rate and the room rate and What do you have you have the hotels revenues? How much money they make? So the vacancy rate again is how many rooms are empty in a certain night or a certain week a month a year as a percentage of total rooms, okay now The companies usually report this once a quarter say, you know this quarter across our 10 properties or 100 properties our vacancy rate was 80% right now Let me ask the audience this think about this and just raise your hand or shout out if you think you have an answer What do you think is a smarter way? What could be an more insightful way to find out the vacancy of a hotel in a given night? Without having to rely on the company information like when you were at the hotel How do you think you could get that percentage how many rooms are full and how many rooms are empty? You know and then in a new way any ideas Yes Parking lot excellent number of cars in the parking lot anybody else or there number of what? Number of keys hang. Okay. How about this? How about this? Okay, good. How about this the number of lights that are on? At a certain between like, you know 8 p.m. And 11 p.m. Number of lights that are on outside. Okay Now this is a perfect example of alternative data In fact, I've been involved in a project believe it or not where we went around or a team went around and they stuck Cameras across the street from a hotel Take a picture of the hotel every night Use an algorithm to count the number of lights that are on because you know Maybe it's on for 10 minutes and maybe it's not and you know, maybe it's a cleaner But you know basically something that says yeah, you know Count it counted as yes if we see a certain amount of activity and from that yet they can see rates Okay, that's one example And then of course, you know the The other part of the story is like I mentioned earlier the amount the room rates how much they're charging per room Well, that one is really easy You just go on their website and you could check every single night. How much the room costs Now you put these two things together every day you go to their website with you know, either manually or obviously probably automatically You get that information you get the room vacancy rate information and you can get some that I heard somebody Mentioned you could get that from the website to yes sometimes you can't figure out the vacancy rate from the prices and how they change and And so these two things together they form a piece of information for a hotel investor Which would help the investor figure out how much money? the hotel is bringing in every day and That information could be very valuable for investing into the company because you could know something before the rest of the street Before they announce on their quarterly earnings. This is the type of data that is alternative data. Okay, let me talk about some other examples of alternative data Online You know those toolbars you install in your browser Well, most of you guys here probably don't install toolbars But a lot of people do install toolbars. Well, guess what those toolbars of course collect information that information is sold to wall street so online activity URL click counts Satellite imagery data somebody mentioned parking lot. So there are companies satellite imagery companies that sell that data and actually That data it's quite good. It's quite a good party conversation and sometimes it's powerful but counting the number of cars and parking lots is Is actually not the perfect way not the best way to figure out how much revenue companies made But for some things it's good for mall analysis and other stuff other examples of Alternative data any scrape data any web harvested data would still be considered alternative data. So harvesting eBay harvesting craigslist Are we doing yes? No All right, some other examples of alternative data Consumer consumer credit card transactions, that's an example of alternative data actually Obscure city records so a lot of municipalities in the US and across the world Have very very useful data, but it's very hard to aggregate very hard to collect some companies collect that data scan records Figure out permit activity for example in the US in every municipality Every every house everything every modification to to a building whether commercial or residential Has to have a permit. Okay. These permits are collected across the US at the municipality level now The US is far behind other countries in terms of record-keeping a lot of times These are handwritten notes about some contractor that says oh, we're gonna do a roof replacement here We're gonna do, you know a building build, you know AC Air conditioning build out here. Anyway, these are all examples of alternative data sets That's right. It's better this way. Okay, so So now What is in this alternative data ecosystem? How how's it structured? Who are the players? I think of it in terms of three buckets Bucket number one are the sources the raw data sources. That's bucket number one These are the people who actually own the data Great so the supply chain so you have the people who own the data Then there are the intermediaries people who maybe collect this data and analyze it and create an analytics layer on top of it And there are the consumers of the data and these are typically the funds hedge funds mutual funds Endowments, etc. These are the people actually using the data any in any investor people using this data to make better investment decisions okay, so the first bucket sort of the data sources and I think you know part of the reason that This conference is a good fit with this with this topic is because I think some interesting new providers of data are in this room today and I'd love you know as you're looking at this presentation if you think hey Maybe I have some data that would be interesting to Wall Street investors Talk to me afterwards. There's a actually I'm doing a Q&A session at or I think We'll figure it out. So anyway, so what are the ways to collect this data direct data gathering? This is sort of like primary research Or web web harvesting that would be all in this bucket. Okay data vendors These are people usually it's a company somewhere in the supply chain and what makes this What differentiates I would say alternative data vendors from traditional data vendors Is that alternative data vendors are not in the business of selling data? This is key. They are in some other business But their data is exhaust. It's an exhaust of their operation But it just happens to be very valuable in a way. They probably never imagined Okay, and then there's just downloading the data because actually and this opportunity I think is sort of lost over by a Lot of people because they think if I can just download it, it's probably not valuable. Well, actually it is And that this is a growing data source that is it is becoming quite powerful Okay, so let's talk about direct data gathering We have a web harvesting and primary research primary research sort of surveys That's you know, that's old school. It's still relevant, but it's It's a very developed industries like Nielsen IRI and PD all these companies and then web harvesting a web harvesting is interesting Let me let me see a show of hands. How many people here have ever worked on or ran a web scraper crawler That's amazing like 50 60 percent Okay, so if you've ever run a web scraper or a crawler, you probably might have some data that is valuable to a Wall Street firm So when talking about this type of stuff You know web crawlers, I think are the most deceiving technical activity that one can engage in why because Writing a simple web crawler is something that is almost like the second thing you do after a hello world Okay, it's so simple, but maintaining a Set of web crawlers that is consistent that is accurate that changes when the website changes is Extremely difficult and so I usually advise my clients to not build their own web crawlers to actually go out and buy Or outsource their web crawling activity for various reasons one. It's faster to scale a lot of web crawlers have back data If you start crawling today, obviously, you're only gonna have data starting today It's a and also there's a compliance aspect to this where a lot of the risk Inherent in web scraping is actually undertaken by the by the party doing the scraping So if you're a fond if you're a client, and you're not too sure about this world You haven't investigated all the laws, you know You may actually get yourself into a world of trouble if you start, you know pinging a website and you ignore a cease and desist And professionals may have a better way better experience dealing with it and They're the ones that are gonna have to answer if anybody comes and starts asking questions about hey Why are you crawling my site? You know one of the advantages of doing it in-house is that you get control and Wall Street firms love control But overall I advise to to outsource Now there are some new techniques for web crawling that I think are quite interesting We'll just gloss over this it just has to do with looking at the expats and looking at the OM tree and basically saying if the If the end node of the OM tree changed from one day to another because the website changed Can we get a new expats that we think maps to the same node as before? And there are companies and technologies that these days do this kind of thing even though In my experience working with a lot of web web problems firms out there few actually use this technology mostly it still comes down to Manually changing the X path or the regular expression when a site changes getting alerted in the middle of the night Oh, you know your Craigslist scraper broke, you know get up and change it Okay, so primary research. I've mentioned this You know there are companies have been doing this for years But I think what's interesting at least to me here is when you start thinking about the world in terms of alternative data In terms of hey, where can I get this extra insight that that nobody's thought of you start really noticing these patterns that nobody Noticed before and they're right there in front of you For example Receipt numbers Okay, have you ever looked at your receipt and looked at the number well for most companies these numbers are sequential So if you buy something at at a store today you come back you buy it a month later You look at the difference in the invoice number. Well, those are the number of items that they sold that month that date is quite valuable to Okay, or you start looking at serial numbers, you know, maybe they're sequential as well You know, you start looking at you know classified ads or Google trends or you know all kinds of Detail about the everyday world. I mean there they're all these stories about how the allies used Serial numbers to guess how many tanks the Nazis produced Where the Walmart founder used to fly in his little plane to figure out where to put New Walmart stores by seeing how many cars were in parking lots, right? It's this kind of thinking that really uncovers this hidden world of data that could be monetized All right now this Happens to be some of the most popular slides in these kinds of presentations So you're interested to get your phone out and there's there's two of these So this is what I was talking about the the waste their opportunity There are tons of free data sets out there and they're growing every day now usually there's you know, I won't say much about except for this Typically they're most valuable when enhancing a primary Alternative data set. So let's say you have something like a consumer credit card transaction data Well, maybe you can enhance it with one of these data sets To add demographic or geo or all kinds of other information But actually there are times that these data sets to be valuable all on their own You can build a trading strategy just from these data sets alone and there are companies in fact just yesterday My old employer Steve Cohen Funded a company at 250 million dollars that basically lets users build their own strategies on top of these data sets So there's value there and there's some more I'm sure this will be available somewhere but More data sets. Okay. Hmm. Hi opportunity data sets Let's let's talk about low or you know the current landscape and then we'll talk about high opportunity Most alternative data focuses on sales Typically in the US so consumer related sales transactions in the US is it valuable to have a data set Another data set that focuses on consumer related sales transactions in the US. Yes, it is but every new data set is less and less valuable because you know Like target target in the US if you look at how accurately the analysts have predicted targets revenues over the last 8 to 12 quarters the accuracy the error Guess what the average error was at targets revenues that the sell side is predicted 1% average Okay, so beating that is really difficult and even if you beat it nobody cares Because the stock doesn't move on revenue surprises. That's target stock doesn't learn There are still there's still revenue surprise US based consumer company data sets that are valuable to go to be valuable but Where are the new opportunities? I put them into three broad buckets one international data sets while US Revenues are mattering less and less for investors international revenues are mattering more and more China online revenues matter a lot JD comm all these kinds of stuff all these international data sets I mean, there's a there's kind of it's all very US focused even Europe has 10 times less alternative data coverage than the US Insights into margins so just by the nature of these data sets most of them are going to focus on revenues on sales But if you can tell me something about the cost of these companies so I can get at their profit That's really interesting right tell me about their labor costs. How can you tell me about their labor costs? Well, maybe you can go on LinkedIn and scraped link it linked in and look at their who they're hiring and These and try to run predictive models on these people's profiles to figure out how much they actually get paid So you can tell me the labor costs as they change week to week Okay Or maybe you can look at the at the parking lot of these companies and figure out where the cars are coming from if you Can have car tracking information see the neighborhoods that the cars are coming from look at the average income in that neighborhood and say well if a lot of If more cars are coming to the parking lot of this company from higher income neighborhoods, that means they're probably spending more on labor costs You know resource costs all kinds of other costs and then of course B2B data sets anything that has to do with B2B is extremely valuable all across the world Why because most of these data sets focus on consumer related companies because you're tracking consumer related activities Okay, how do we how do we evaluate a data set you think you have a data set you think it's valuable? Let's talk about how valuable it is One of the one of the things we look for is scarcity all right, is it I mean self-explanatory granularity At the time level right is your data set annual is it quarterly? You know is it monthly is a daily basically anything that's You know we're looking for data sets that are daily maybe monthly anything quarterly or an annual It's going to be maybe an additive data set to a primary data set something to enhance something to to triangulate a primary data set off of But it's not going to derive value on its own How structured is it actually this is a very interesting question because on one hand the more structured the easier It is to get to the insights on the other hand the less structured the more likely that If you get to the inside of the fund you're the only one or one of the few that's going to have that insight We also call that kind of valuable inside alpha. So You know there is a balance you want a data set that is just structured enough that you can do You know a couple of months of work and get the alpha and be quite confident that other Funds don't get that same alpha on the other hand if a data set is extremely un-structured and it takes a year of R&D to get somewhere Well, you know typically these funds are not that patient And then coverage of course coverage can be thought of In terms of geography, but really from a hedge fund perspective coverage is how many stocks it covers how many sectors and how many asset classes Okay Evaluating vendors. So if I am advising a fund, how do I help them think through what's a valuable vendor and what's not a valuable vendor? So You know, basically you have these companies again that monetize their exhaust data Right, and I mean we can go through a few more examples Say you have a cardboard manufacturer the manufacturer cardboard, but their cardboard is used to package computers Dell HP And you can figure out from their activities, you know, what the sales of these of these of Dell HP are You know what their sales are you could maybe figure out From trucking activity what the state of what some macro indicators Ports you can look at ports and you can sort of figure out. Well, how many how many goods are coming in and out of a certain country, you know when the US if if You get a package or what we call it a crate Delivered to the US everything inside is actually publicly available data. So every item that is inside that crate is Available to mine. It's extremely noisy and Companies actually use fake buyer names like Walmart. It's not gonna be they're not gonna say Walmart when they're buying stuff from China and India they're gonna use another name Okay, but you can try to figure it out you can do machine learning to figure out who's who and then figure out who's actually which retailer is getting what product Then you have Intermediaries, these are companies 10 10 m 2 7 park you go out, but these are all companies that try to get these data sets and Sell the insights because the fact is that this kind of activity is really really expensive very few funds have the The money and the patience to invest the millions of dollars required to buy these data sets and R&D these data sets and so for the smaller funds don't have the appetite for this you need there's a there's an entire market of An ecosystem of intermediaries that help them do this now of course the upside If you're buying from these intermediaries is you don't have to buy these data sets yourself You don't have to the R&D them yourself. Of course the downside is well you get the same thing that everybody else gets You don't have your own unique view on the markets But it's still very small. So, you know, one of the questions I get asked a lot is well I thought every fund is doing this. I mean, I thought this is how hedge funds work No, actually very few funds are doing this Maybe it's still, you know, 5% of the funds out there in terms of assets under management The amount of assets under management using any of these methods is tiny. It's a sliver It's a tiny sliver of this stuff. That's why that, you know, people worry about Alpha arbitrage of these data sets while if I if I have it and somebody else has it Isn't it less valuable now that I have it in theory? Yes But because there's so little AUM writing on top of this data It's gonna take years before your activities in the market using these data sets is actually going to drive the markets You know, it depends really on the on the stock And you know the answer is it depends but overall this is still a very very small percentage of the overall population Okay, so, you know, I'm sure as I was talking about all this a lot of you have been thinking well Is this stuff legal? You know, I mean is this this is ringing some bells So the answer is of course it is but you have to be really careful and you have to be Conscious of compliance and so that's why we're going to talk about it a little bit today You know in a way these kind of strategies You know you look at SEC Steve Cohen when he built out the alternative data team after he was dragged through mud by You know by the South District of New York I mean basically the the data team was an antidote to all the insider Trading allegations that have been challenging his firm. Why is it an antidote? well, because if you have a data stream and you can you can track you can basically Track your insight your investment back to the data where it came from and that data has been validated by compliance You're in the green Whereas if you have an investment that maybe was sourced by it through a conversation Well, who knows right? That's a that's a much more difficult thing to prove to a judge and a jury so from a technical perspective the The smart way of doing this is you have the data From the vendors the raw data or the aggregated data and comes in when it comes into this This box can be either a fund or it can be an intermediary anybody That's really using the data set as this box comes into a restricted environment and this restricted environment is restricted in terms of access So very few people have access to it Okay, in in this environment you do your PI scrubbing and you basically do all of you or make sure that it's Make sure that the data is legit that it doesn't contain any insights that are That may set off some bells and then you push it to your production to the rest of the organization. That's the simple version of this There this is a huge gray area There are very few laws in any country addressing this but actually what matters that I've seen personally through my experience is intent intent matters if you if you try in your organization to be above Water it's going to matter It's going to actually hold its own that intent is gonna hold its own regardless whether it worked or not that that you're trying to be compliant rather than You know just having the Wild Wild West and saying well, there's no laws and regulations Let's just shoot from the hip without best practices best practices matter Even if there is no master document even if there is there are no guidelines out there to tell you how to do it If you try it's better than not try There are some frameworks out there than it and I see 122 tantalizing read I highly suggest you guys go So one of the areas that really Sparks the debate in terms of compliance is what harvest Can you go out? Can you browse the web and crawl it and download the data from a website and is that legal? And the answer is well, it depends overall the Courts have ruled That as long as you're not misrepresenting yourself As long as you're not clicking on you know, I agree not to you know Not to download anything or you have to log in if you can just go into a site And maybe there's a terms of conditions somewhere below that you have to click on That is not enforceable just because there's a link somewhere on the site for terms of conditions That does not mean that you agree to the terms and conditions contract when we use the site if you explicitly agreed That's one thing if it's just somewhere in the corner The courts have ruled that that does not constitute a contract Most of these Most of the cases Let's look at the cases here. Most of these cases So this is a an infographic of all the cases that have to do with web crawling so far and over time and if it's above this dotted line that means it was favorable for the web crawler and if it was below it was favorable for the website that was Assuming the product overall You can see that most companies are in the green the ones that are in the red mostly They have ignored a cease and desist multiple times So the advice is you should not ignore cease and desist But other than that tread lightly Follow best practices But overall you can sort of collect information if you do it carefully with respect Yeah, again. So, you know, stay ahead of laws and cases in terms of All of these, you know anything coming down the pipeline Respect the terms of services to quick crap. Let me talk about quick wrap versus browser wrap so a browser wrap is Is basically where you know, it's somewhere on the side You don't have to click on it. It covers your whole browser quick wrap is where you cannot use the website Unless you click on I agree or you log in Okay, these are very different. They should be handled very thought of in very different terms If you click on something that could be a contract if you did not click on anything that is typically not a contract Okay, let's get to the good stuff. How do you actually make money? How do funds make money from these alternative data sets? The number one the most common way is revenue surprise estimates So we're trying to figure it so you have a couple of components here one is you have the revenues of a You know company let's say target or let's say Lulu lemon You have Wall Street analysts that cover this company and predict its revenues for the next quarter the next year, etc Okay, and the surprise is the difference between the first two So whatever the company actually prints whatever they report is going to be slightly different than whatever Wall Street predicted sometimes it's very different and That surprise can be monetized if the company has a positive surprise that means they did better than expectations The stock typically goes up negative surprise the stock goes down if you have a data set that Tells you ahead of the tape sort of what is happening to the company if you have an insight if you have analysis You have primary research you have secondary research alternative data You can be better than the street because the street is not actually that It depends but the street is typically not that great at predicting revenues exceptually especially of companies Whose revenues are not that stable? So they are So you can sort of see how that could make an investor money Operating gap measures same stuff. So we're talking about Income we're talking about Costs we're talking about all this stuff that the company's print that if you have an estimate before that before they release You can potentially make money Non-gap measures. So what are these things? These are other operational metrics that are not covered by gap gap is an accounting standard and These are things like churn for example Netflix subscriber growth churn flow share You know, what's the you know, what are people doing after they subscribe on Netflix if they start also subscribing on food Do they drop out? So knowing all these parameters will help a smart investor Potentially make money fully automated want strategies non equity asset classes most of this stuff is still Equity most of the stuff still deals with equity some macro, but that's about it You see very little of these alternative data strategies used for let's say any kind of credit-related securities like debt securities and Strategic investments thought leadership. So there there are lots of ways that Investors could make money or monetize alternative data But actually the first one here revenue surprise estimates is probably about 90% of the game today That's going to change. It's going to be 50% soon. But today, it's 90% So, you know, one of the things that I typically advise my clients on is that revenue surprise estimates are valuable sometimes in Generating PNL and actually making money off the security But the real way to use alternative data is through thematic investments. And so what this is is you create a debate between One view and another view along in a short and use alternative data to settle this debate. Let me give you an example So one of the recent debates in last year was around smartwatches, okay? Our smartwatch sales going to impact luxury watch sales Rolex, Omega, etc So one side of the debate says no, this is a completely different category. It's a it's a piece of jewelry All right, it doesn't you know this $500 watch or $800 smartwatch like a phone, you know It it's a it doesn't affect people who go in to show off the status on their wrist It's a totally different category. So they have nothing to do with each other Another debate another part of this debate says you only have one wrist So of course it's going to impact luxury watch sales and the truth is that nobody really knows the answer Okay Apple doesn't know the answer All right, Rolex doesn't know the answer and this store Torno Torno is a luxury watch Retailer they only sell luxury watches. They don't know the answer the people that know the answer the people that have all these different data components All right, so here's a type of analysis Looking at two cohorts One are Apple customers because if you buy an Apple watch you're gonna you're gonna be an Apple customer I don't think you can buy an Apple watch without having some other Apple product So you look at look that Apple customers non-Apple customers and look at their sales At the store called Torno, which again only sells luxury watches. And so you look at their sales Two cohorts around the time that Apple Announced the Apple watch and all these different milestones Apple announces Apple pre Pre-release and then actual sales what you can see is These are sort of year-over-year sales trends The top one is you could see that when Apple start announcing started like pre-ordered begins The drop-off is huge 30% year-over-year drop-off from these people. So this basically answers the question. Yes, absolutely luxury watch sales are going to be impacted by Smartwatches and if you dig a little further, you can even figure out how much you could know that every unit of a lug of a smart Watch is going to impact luxury watch industry by X Okay, so this is the kind of analysis that's very useful to investors. So Movado, which is a luxury watch retailer to it's a pure plain name That's basically all they do is luxury watches. They were really hit by By these news but if you if you did this analysis at this time You would have made a boatload of money Because at that time nobody really knew the answer, you know first luxury watch companies were saying no, we're not going to be impacted Nobody really knew the answer except for people will go turn to data Okay, let's talk about the workflow and process a little bit So this is what a typical? Structure the flow of a company looks like that's consuming alternative data You got all these different sources third-party raw data data vendors and then inside you got some different groups You got the data acquisition group These are people that are you know sort of going around the world going to conferences and finding data sets talking to Potential vendors talking or maybe you know figuring out what website to scrape or etc These are people that are sourcing these Then they work very closely with the data analysts because to know what's a valuable data set you must know how it's going to be used and Of course, they have to be you know You have to have a stack a high performance stack because these data sets sometimes they're enormous Sometimes they're tiny and what is the constant about them is that they're all very different So sometimes they come as JSON files sometimes CSV sometimes Direct link to redshift sometimes, you know just document So you have to have a system that can deal with heterogeneous datasets that is the key Then this you know you have this R&D quant This is a very interesting position where you try to create strategies on top of these alternative data sets You may not be monetizing them because really to be successful at monetizing alternative data You need two parts you need these experts in the alternative data world But you also need traditional investors who know how to make You know how to integrate the insights from the alternative data process into the investment process Because usually alternative data insights by themselves are not tradable typically not always This stuff goes to sector research and visualization teams and then eventually it goes to long-shore teams and Traders so these folks on the right-hand side are the folks actually treating the insights from alternative data set You know in terms of the conceptual process First you acquire the data sets Then you have to normalize the data sets and this is sort of a overarching category, but this a lot of times this is where the the real 90 80% of your work is going to be spent in normalization modeling because a lot of times these data sets do not Represent the whole picture they represent just the slice of the picture and to know how to get from the slice to the whole pie You have to you have to model you have to make some assumptions This is this is where these other data sets come in useful these external data sets that I've talked about earlier That maybe you can figure out, you know, you have maybe online You know online quick-stream data from a million or five million users and you can see their online shopping activities But what we really want is there all of their shopping activity? Well, how do we get from their online shop activity to their total shopping activity? Well, you have to you know you have to model it you have to figure out how this whole all this stuff It's the other sometimes it's impossible sometimes You know, you have let's say data for the you us data need to figure out International growth, you know netflix subscribers in the US. How does that relate to netflix subscribers worldwide? It may not it may you need to combine different data sets This stuff is so early that actually the combination of data sets is something that is very rarely done Almost never and I'm sure a lot of people in this room are like why this doesn't make any sense Why is this not done? But it is because it's very resource-intensive. It's very new and this is the state of the industry today And then you know eventually you get to gap operational metrics modeling revenues And then you get your font signals you get your thesis insights that I've just talked about and then eventually it goes to You have to be reporting on you before you publish it I won't spend too much time here. I'm sure that You guys could speak to this much better than I could but this is a typical. This is very similar to the Berkeley stack This is your typical Investment research alternative data stack Elastic search is interesting there, you know indexing documents But I'm sure this is going to be very Familiar to all of you here Okay, let's talk about this is I think we're coming to the end here Talk about a now I mentioned that you know what strategies are typically not used on alternative data It's too high-dimensional. It's too difficult But the few exceptions Sometimes we could actually make use of these data sets to generate money in an automated String of systems. So let's talk about that. So let's say we have revenue surprise estimates for a bunch of companies so we know We think we know what their revenues are going to be and we can compare that to wall street estimates And so we can figure out, you know, what the surprise is going to be So you really need sort of three scores. You need to know what the surprise estimate is as a percentage of total How what the error is on our own surprise estimate? Okay, and the third part is how sensitive is the stock Going to be to a revenue surprise So even if a God Almighty came down from heaven and told you the revenues of the top 1,000 companies traded in the US I'll tell you this the correlation between the revenue surprise as a percentage of revenue and the amount the stock moves A week around announcement time as a percentage of you know a performance that correlation is point one nine Okay, it's very low So even if God Almighty told you the revenues of these companies, would you be able to make money? Yes, but it would be much less than you think because revenues are just part of the part of the game There are many other things that drive company So you need to know If given for a given company if we think there's going to be a surprise How sensitive how much is the market going to care about that surprise and I can tell you this the one inside here is looking at the Looking at sell side notes looking at the actual Written research about a company if you just look for the word revenue or sales If those words revenue or sales over time bubble up towards the top of the documents Not the number of occurrences, but how close to the top of the document? Those words are that will tell you that this company will be more revenue surprise sensitive this quarter than it was last quarter So let's say we have these three scores We have the desired trading window. And so what we want is what do we want? We want a company that that is going to have a high revenue surprise so We our estimate is very different than what the market with the cell site predicts We want to be very sure of our estimate and we want for the market to care so we want our Expected sensitivity to be high and if we have all three of those things Whether it be high or low positive or negative we take our position the size of the position is relative to these three scores and the You know and the direction of the position is obviously the directionality of this Okay, and as an output we have our positions and on top and I'll wrap up with this that my prediction is In the next three to six years this idea of using non-traditional data is Going to be so prevalent that we'll have to figure out a new word From alternative we'll have to change the world So It's sort of an arms race. It doesn't even matter anymore if these strategies make money on their own Because just like no hedge fund would go into business without having a Bloomberg terminal Which by itself may not necessarily make you money You know for a fact that if you don't have it you're gonna lose money So just like any arms race the participants have no choice but to partake and Over time this is going to actually increase the transparency of the markets. It's going to help investors Really focus in on what matters which is the underlying ground truth in the world around us and because of that I think this whole Idea of alternative data is actually bringing some Some refreshing changes to the world of investing because it's helping us focus not on what the other guy is doing But on what's going on with the reality, which is ultimately what investors are paid for for figuring out ahead of everybody else What's going on in reality and reporting it to the rest of the world using signals as They buy and sell the companies and help evaluate those assets They're helping the rest of us who's helping the rest of society Optimize the resource allocation and now they're going to be doing it a lot more efficiently with alternative data So that's it. I don't know how much time we have for questions Okay, great so Questions Hi Jean, can you give us an example of where Alternative data has been used for non-equity classes because you said you see that growing to 50% where it's hardly anything today. So yeah an example and how why do you have that intuition that it's going to grow? Sure, so I mean, you know some non-equity classes act like equity right in a way debt acts like a put option and So in that sense if you have operational information about a company, that's maybe close to going bankrupt You want to know if they're going to be able to pay their creditors in that sense I mean, this is a cheap answer because I'm just saying well, we're gonna use it in the same way that equity is used Here's the more interesting different way that these things are used if you are venture capital fund and You'd like to get more information about the market that you're investing in having alternative data Will give you that confidence in your investment in the portfolio company and in a in a way It's due diligence. It's market research So for a PE fund or or a or a venture capital fund These data sets can have insights that traditional data sets cannot Ashutosh So Jean I had a question regarding What are your thoughts about the like risk of insider trading? Using this data using this approach, right? So, you know insider trading is basically defined as when the insight that you have as an investor you can trade without Having to do work on it without having experience without having Having to basically put work it's defined that way if you have some piece of information Where you have to put in a bunch of work to make a trading decision Then it's not insider information typically if it's some piece of information where you don't have to do any work where an idiot can take that Information and make an investment decision that can border on insider information these data sets are Extremely expensive to analyze They take a huge amount of work to go from you know Here's your terabytes of data. Tell me what companies to buy and sell so it doesn't really fit that criteria In a way, it's an antidote alternative data sets are an antidote to insider trading most insider trading today Is almost by definition Things that people say to each other It's not things that come from data sets if it comes from a data set. It's not likely to be Insider trading now, of course these issues It's a gray area and I guarantee you that in the next five years to ten years There is going to be a government government oversight over what's considered You know insider trading what's not but if you think about it the very nature of capital markets is for Is for analysts to get paid for doing a huge amount of work to transform some pieces of information Into evaluation and then give that information back to the market and that's exactly what's happening here It's working as it is supposed to you have these difficult to find difficult to analyze data sets people are putting tons of effort into it very smart people you have to Have tons of education and in the end you are figuring out some Nugget of information that nobody else can ever tell you and you're putting that information back into the market So countries data for countries like China and India right so if you were to stay for data Try crime data for example, right? You don't get a lot of information at least from Southeast Asian countries, right? so do you typically tend to use primary research for information like that and And Right, so the question is information from From you know non-us non-Europe rest of world like I mentioned before this is a huge opportunity because Because there simply is scarcity in the amount of data that's available in Countries big big market countries India China lots of you know Brazil These are big markets, but very little data so You know in terms of what's currently used. I mean there are some startups that are working on this There are companies such as Nielsen and IRI that tried to take their methodology Internationally, but that's the whole point of me being here is to try to grow this idea of alternative data and Try to find Non-traditional data sets that could be added into the investment process and and you expect these startups to make this data available in an open-source manner or Typically not because it wants to be they want to monetize that data Do we have questions in the balcony Okay, I have a question here Say for example, I am I'm planning to open the e-commerce platform A business on the consumer data consumer products. So how do I identify the alternative data vendors? How do I get the alternative datas on? How do you identify the current data vendors? Yes particular to my business Well, that's a challenge The knowledge the very knowledge of these data vendors is valuable. So there is really there are very few places out there that Have a list of alternative data vendors One place you can start is a company called Eagle Alpha They are putting together a list of data sources You have to subscribe to their product to get that list, but they have a list of about a hundred fifty alternative data vendors That some of them are more or less valuable, but there is no easy way to find this data Okay, so for example, I'm I'm starting I'm planning to open my firm in the local like regional not international at that at that point of time I need a data from the regional where I can you know sell my business. I mean so data markets Are one place, but these things typically haven't taken off. They're not you know, I typically the quality of data on data markets is still Well, there are some entrants such as Kwon Do That tries to buy data and we sell it. So try kwon dole calm Thank you. Yeah, so new sentiment analysis that what you're talking about. Yeah, a new sentiment analysis is It is still an opportunity. I think there are you know, there are some companies like Raven pack and others that You know try to get a sentiment out of out of news in my experience For one reason or another a lot of these new sentiment companies are using fairly standard algorithms to extract sentiment and They refuse to get more content context out of the news sources. So like directionality and you know more Sophisticated ways of analyzing. They're still sort of counting the scores on the words and adding them up Which these days has not that much value anymore So, yes, there's opportunity. It's a very heavy investment though, you know, I You have to have 20 30 million dollars to start a company to do good news analytics because it's very resource-heavy But yes, there is opportunity Hi, actually parsley. You're my question answer basically like if I go out I go to training this and I tell them that There's something new Right now, do you see any kind of standardization which is happening because from what I heard right from you? They're different vendors out there. So they have different strategies and like the way we take the logic example So every sector is, you know, there is no consistency. We see basically when goes down to traders to do to decide upon the investment Yes, like we have more in SNP, you know a plus plus so they understand it. I mean something you're going to You see in future that 80 is going to sign ice. Yeah, so there's no consistency. I agree with you There's no consistency because of the high dimensionality of these data sets, you know This is not like, you know, even if you look at price data from changes You have tons of feature engineering on top of justice one time series. Well many times series, but one type of time Here you have the very nature of these data sets that dimensionality is Exponentially greater than the traditional one data sets and the lack of standardization is not going to go away In fact, it's part of the appeal as I mentioned Mostly it's fundamental funds not quant funds that are using these data sets The reason is because to interpret even the insights from these data sets requires the most powerful computer on earth right here Because of the high dimensionality it's going to be a while until Quant funds even though some are starting to do it like to Sigma D shy, etc But you know, it's going to be a while until it's it's going to be all standardized one question for most of this Lecture was focused on like earning surprise What would be alternate data that if you want to predict? Mergers and acquisition or the success of a merger and excision. Yeah. Yeah. Yeah, I mean I've seen data sets that try to predict that very interesting stuff I remember a fund it was looking at Where the airplanes were flying like private jets in the US looking at the tail number and seeing the Destination of a particular CEO's airplane So, you know, if if a carrier company is going to Kansas City or like some some company that has to do with TMT is going to Kansas City. Maybe they're going to be bought by Sprint, you know So I've seen stuff like this done Yeah, absolutely. There's there's opportunity there and you know, it's it's a You know there you do have to be very careful about being on the right side of the law But typically again, if nobody's telling you which company you're not getting a data set that tells you this company He's gonna buy this company if you if you get something that you still have to put a lot of work You know, there's a lot of chance Then then you're gonna be okay We just have time for one more question Hi, Jane, this is more of a thought experiment in machine intelligence, but less of a question So currently alternative data comes in when an analyst realizes that there might be You know a more effective way of answering a question which is not available through the data sources that is available at hand So do you think that any time soon we would have a machine autonomously deciding that it would need data from alternative sources and Being able to out on automatically talk to another machine No, I don't I don't think it's going to happen anytime soon. Why because actually the more valuable data is valuable Okay, and so the more valuable it is the more likely there's going to be walls Built around these data sets rather than these data sets putting themselves out there and say, okay You know use me if you want to and You know pay me if you think I make money. It's not gonna happen The more valuable data sets are going to be more exclusive. They're going to be available to a smaller contingent of funds It's still any fund that wants to buy them, right but but They're going to be expensive. They're not going to put like a rest API out so any investor can utilize it That may happen in 20 years But it's actually going to be more isolated first before it becomes more sort of Democratize people talk about democratization of data. I see it the other way around more valuable data sets are becoming More exclusive. Yeah, which is where I had a follow-up question. So tomorrow if like say in 20 years, we have such a We have we do reach the scenario in which you know such automation exists. Are there currently initiatives which are working on putting some kind of pricing or valuation to the exchange of data Which might happen. Yeah, I mean Kwan though again Kwan though is one of these places that you know you have a data set you can try to resell it and There's some you know market mechanisms around pricing, but basically no, it's still, you know, that's that's in the future Thank you, thank you very much