 on the campus of MIT in Cambridge, Massachusetts. It's theCUBE, covering the MIT Chief Data Officer and the Information Quality Symposium. Now, here are your hosts, Stu Miniman and George Gilbert. Welcome back to theCUBE here at the MIT CEO IQ Symposium. Happy to have on the program. Welcome back, Steve Todd. And for the first time, Barry Rudolph who's the CEO of Velocidata. Steve from MIT and Barry from Velocidata were both part of a data risk valuation and data insurance panel this morning with Dr. Jim Short who we're gonna have on later in the program. Steve, we got to introduce you yesterday to our audience. So I wanna get Barry the opportunity. Tell us a little bit about your background, what Velocidata does and what brings you to an event like this. Well, my background's actually been in data for most of my career, either kind of primarily around the infrastructure. So moving it, storing it, protecting it, computing on it, and a long career with IBM and kind of in that space. And then now I'm the CEO of a startup company called Velocidata that's doing stream computing, stream analytics. So kind of the notion is if I can operate on the data inherently with its motion, not having to land it, not having, so not to create any latency at all for a whole set of applications that's kind of an interesting kind of technique and capabilities. That's what we do at Velocidata. All right, so Barry, for those of us that come from the infrastructure side of the house, I mean data was just kind of almost traded like that widget that sat on the infrastructure, whole industry spun around creating storage, networking it, maybe sharing it. But the value of data is something that's a little bit more recent that we've talked about it. Can you talk a little bit about that transition, what's interest you do about that and your viewpoint on it? It's a very interesting kind of notion. In some contexts we've been doing this for years. So all the things you just mentioned, if you think about a fundamental decision, what's the value of performance? What's the value of availability? What should I back up? What should I have from a disaster recovery standpoint? All of those infrastructural decisions that we've made for a long, inferentially talk about the value of the data. I'm not going to back up data of no value. I'm not going to put it on high performance EMC storage or whoever, right? So what's fascinating about this though is trying to come up with some topologies and way to actually think about it more directly, not just purely heuristically in side storage systems like we've been doing for years or in distributed systems, but try to attach real business value to it and align with that business value. So it's been an interesting set of, interesting activity and very enjoyable so far. Yeah, I guess Steve, what I'm curious about is when I think about it, what we discussed there, is a lot of times it was understanding the value of data to say, okay, how can I justify getting the performance or backing things up or paying for something else as opposed to just how do I run my business? How do I get more data out of value? A little bit of a bit flip if you will of how I look at look about data. Right, I think both Barry and I in developing storage systems, most of the time are not aware of how the data itself was getting consumed by different lines of businesses necessarily. And we're finding out that that's critical, right? To understand the relevance of that data to different use cases in the business, the value goes up. So we're in need of some frameworks that span kind of the application and business uses of the data as well as the storage paradigm. So keying off of that, we have these performance and user experience tools measuring performance, responsiveness and like how at either end of the bell curve, what's the fastest or the slowest? But when it's like high frequency trading, you're either a couple of microseconds or milliseconds faster than the next guy, that makes a difference in terms of your knowledge of the real price and you can arbitrage that. Yeah, to Steve's point, this connection between the data and its characteristics, the time value of that data which may be a short duration, kind of a latency discussion around something like high frequency trading or financial markets, it may be a longer latency like what's the value of storing data and keeping data for a long period of time because it may have future analytic value. And so there's a dimension of time and there's a dimension of knowledge of what the linkage to the business model and the business processes that Steve suggested. Historically in IO subsystems, all we know is how frequently do you touch this block of data or a set of bytes of data. There's no real knowledge at all beyond that. And so caching algorithms and locality and all kind of were based on that. There's a big move now and what we're trying to develop some tools around is to create, as Steve said, those linkages to the business value, the applications that it touches, the value of the application to the business. Well, so let me follow up on the notion that some people are saying all the value is in the very fast, fastest response in the data, how squeezing all the latency out of it. But there is context in putting a large amount of data back in the cloud, which might be comparatively slower to get to, but you have so much more perspective. In the case you were just mentioning history, like the famous meltdown was, well, we only ran our analysis back to, I don't know, maybe the 70s or the, I don't think it even went back to the 30s because we didn't have the data. But that has value from a different perspective. Any thoughts on how we can integrate those two? Certainly, we're seeing most businesses deploying data lake type technologies, whether privately or in the cloud, combining streaming frameworks with historical analytics. So for example, a Vodafone type of architecture where they could streaming and analyze, dropped calls, and then look at the history of that particular cell phone user and take some sort of business action based on that. So I think there's a combination of streaming analytics that has a need for historical analytics as well. Part of the interesting linkage here to those, back to the kind of the fundamental construct, is to have an understanding of what data that you want to store historically, that there's some notion of what the fundamental linkages to the important applications and processes of your business. Meaning the questions, what are you gonna ask? What are you trying to answer? What might you want to answer in the future? And being anticipatory. And there's lots of example, geophysical data in oil and gas. If you historically walked into one of those companies, there'd be a huge ballroom filled with old tapes of all of the data, the field data, because they recognize that having that historical information can help them close empirical models when they get more dense information and more granular information in the future. But they required them to know something about what question that they want to answer and where might there be a value proposition now. So I'm curious, when you talk about the value of data, is there a time component to that? Because I think about that there's kind of the streaming real-time value of information, but then there's the data lake and the capacity. I mean, when we look at the storage industry, we said there's kind of the performance aspect and there's the capacity and there's very important uses for both of them, but different. How does time fit into that value of data? One of the things we saw at the conference this week is Doug Laney from Gartner talking about their research on valuation. And one statement that he made is he actually has equations that take into account the stage or the life cycle of the data. So there are equations emerging in the industry where you can look at the age of the data and that factors into an overall valuation score. But in general, that old data that's not being used is classified as lower value and factors in the value in that way. I was just gonna say, again, I think there's an important linkage to the fundamental attributes that may become or are important to you. That historical data might actually become, and classically, we think of older data as less valuable data and we wanna move it through a tiered progression to lower and lower storage and eventually get rid of it based on policy because it creates some liability. But there's certainly a lot of emerging applications around analytics that the data may become more valuable over time. And I think it's creating a whole new sort of challenges that historically we really haven't spent a lot of time on. And one way to deal with that is store everything. Storage is free. So we'll just build bigger and bigger HDFS or Cassandra environments. And I think in the limit, that's just not true. But there is a notion of creating these value linkages. Storage isn't free, I thought Microsoft. Google unlimited, EMC maybe not quite free yet, but. One of the things we learned today in the panel on data insurance was Garin Payson AIG gave an example of a company that sold coffee and they kept all their records of who they sold to. And something that was 10 years old was hacked. That, and they had to pay out an insurance claim, right? And Garin was like, why did you keep that? Why did you need coffee receipts from 10 years ago? So there is a financial penalty that had to be paid because of that. And Garin's advice is try and get rid of stuff that you don't think that you need. What about the notion that you might separate the modeling on a, not just recent data, but very historical data. And then instead of perhaps running the model yourself and even having to worry about securing that data, push it onto a provider like Bloomberg. It's your private model, no one else can see it, but rather than it being a Bloomberg proprietary function, it's your function, it's your model. Is that something that we could see others follow? Certainly could happen. And I guess there'd be a whole set of questions relative to latency and things in particular markets and you're alluding to like a high-frequency trading market. Then proximity to data sources and having coherence and the amount of latency in the system would be really important. It certainly could be separated from infrastructure and algorithmically. I think that certainly could be the case. Today, typically it's not. Today, typically, there's actually a lot of in-house development of the infrastructure around these. And primarily, because there's such a really tight linkage between latency and money. A microsecond trading advantage can mean tens or hundreds of millions of dollars in advantage. So it's a... So I'm in Tom Davenport's keynote this morning. He said, chief data officers need to both go on offense as well as defense. Brings to mind, I hear things like risk and insurance. Sounds very defensive. Is there kind of the kinetic potential of data? Is there any way to kind of understand, I've got data, if I could leverage that or create new business models or do something, how my data can become even more value? Does that fit into the discussion? We've heard all the CDOs say this week the balance between speed and quality and trying to find that fine line. So I think that's the answer, right? Is that you have to play defense to some degree. You have to play offense as far as trying to monetize. But if you need certain private data in order to do monetization and holding on to that could put you at risk, you need to bring in someone like a CDO to help answer that question. Going, popping up a level, does data have more value? Assuming you have a proprietary data set, but you can add contextual data from other sources. Does that make your data set more valuable? And if so, can you monetize it by licensing it to others who could also add external contextual data? It's interesting, we had a little conversation about this in the breakout about data that has value internally because it's linked to processes and other data sources that may be structured or unstructured in nature. And so it was kind of a question, well, how would we value this externally? And I think you've just hit on a really good way to think about it. And in my mind, it's no different than monetizing intellectual property or any other asset. Look for adjacencies, look for other companies or even with your own enterprise that can use that data and have similar processes that they could link to it. So it becomes a valuable asset because of a similarity and maybe an adjacency in some ways no different than selling patents or intellectual property in that context. Under what circumstances or if you were to provide advice for a viewer who's trying to figure out when do I open up access to my data and monetize it? When should you think about it as proprietary and conferring on you an advantage to stay private? And when should you think about it as something to monetize freely because it gets more value when it's used? I'm gonna give an answer again kind of in this analogy to selling intellectual property. Again, data assets are corporate assets just like patents and trading know-how and processes, et cetera. So I really think you can go through a very similar process. I have an asset set. I'm now going to brainstorm around adjacent markets that that may be valuable. I'll define which of those markets are competitive or non-competitive and those that are non-competitive don't offer any real threat. And I can create terms and conditions around the use of that data that ensure they don't overlap with my market. Very similar to the way that you would sell a sort of patents as an example, a family of patents. I also think we're starting to see data brokers emerge. So if you have a data set privately that you're considering monetizing working with those data brokers to say, well, how unique is the data that I have, right? Is it a scarce resource? Is it something that, and then go through the process at Barry's expansion, which is okay, if nobody else has this, what are the adjacent markets in it where how could I leverage this data asset? Interesting. All right, great. I want to give each kind of last word if you've got to kind of a key takeaway or maybe one of the good questions that you got during your panel this morning. But Barry, we'll start with you. You know, I think one of the real interesting questions was again, this notion of how do I, how do I assign value to something that is tightly coupled to internal processes? I'm not sure we have a great answer to that. My own opinion is that we can make this really complicated, like a lot of things in complex environments. But I think we may be overthinking it and think it more from a topstown, a less, perhaps more accuracy and less precision might be an important takeaway from my view. One takeaway that I've seen for sure is the importance of building up a metadata repository alongside of your data lake, for example, that starts to contain valuation information, right? People are wondering, how do I get started with calculating value for my data? And we're starting to see repositories alongside of data lake content that track things like data quality, used by lines of business, and that's probably the main takeaway as far as data value goes. Barry Rudolph with Velocidata, Steve Todd with EMC, really appreciate the conversation. We know we'll be tracking this discussion of data valuation through this show and beyond. We'll be back with lots more coverage here from the MIT Chief Data Officer in Information Quality Symposium here in Cambridge, Massachusetts. You're watching theCUBE.