 from the Silicon Valley Media Office in Boston, Massachusetts. It's theCUBE. Now, here's your host, Dave Vellante. Hi everybody, welcome to a special Silicon Angle, theCUBE on the ground. We're going to be talking about data capital with Paul Sonderreger, who is a big data strategist at Oracle, and he leads Oracle's data capital initiative. Paul, thanks for coming in, welcome to theCUBE. Thank you, Dave, it's good to be here. So data capital, it's a topic that's gaining a lot of momentum. People talking about data value. They've talked about that for years, but what is data capital? Well, what we're saying with data capital is that data fulfills the literal economic textbook definition of capital. So capital is a produced good, as opposed to a natural resource. You have to invest to create it. And it is then a necessary input into some other good or service. So when we define data capital, we say that data capital is the recorded information necessary to produce a good or service, which is really boring. So let me give you an example. So imagine, picture a retailer. Retailer wants to go into a new market. To do that, retailer has to expand its inventory, has to extend its supply chain, has to buy property. All of these kinds of investments. If it lacks the financial capital to make all of those investments, it can't go. Cannot go into that new region. By the same token, if this retailer wants to create a new dynamic pricing algorithm or a new recommendation engine, but lacks the data to feed those algorithms, it cannot create that ability. It cannot provide that service. Data is now a kind of capital. And for years, data was viewed by a lot of organizations, particularly General Counsel as a liability. And then the big data meme sort of took off and all of a sudden data becomes an asset. Are organizations viewing data as an asset? A lot of organizations are starting to view data as an asset, even though they can't account for it that way. So by current accounting standards, companies are not allowed to treat the money that they spend on developing information on capturing data as an asset. However, what you see with these online consumer services, you know, all the ones that we know, Uber, Airbnb, Netflix, LinkedIn, these companies absolutely treat data as an asset. They treat it not just as a record of what happened, but as a raw material for creating new digital products and services. You tweeted out an article recently on Uber and Uber lost about, what is it, 1.2 billion? At least. Six months, at least. And then the article calculated how much it was actually paid. I mean, basically the conclusion was it paid 1.2 billion for data. It was about $1.20 per data for ride record, which actually is not a bad deal when you think about it that way. Well, that's the thing, it's not a bad deal. When you consider that the big picture they have in view is the global market for personal transportation, which the economist estimates is about $10 trillion annually. Well, to go after a $10 trillion market, if you can build up a unique stock of data capital of a billion records at about $1 billion per record, that's probably a pretty good deal, yeah. So money, obviously, is fungible. It's currency, data is not a currency, but digital data is fungible, right? I mean, you can use data in a lot of different ways. No, no, and this actually is a really important point. This is a really important point. Data is actually not fungible. This is part of data's curious economic identity. So data, contrary to popular wisdom, data is not abundant. Data consists of countless unique observations. And one of the issues here is that two pieces of data are usually not fungible. You can't replace one with the other because they carry different information. They carry different semantics. So just to make it very, very concrete, one of the things that we see now, a huge use of data capital is in fraud detection. And one of our customers handles the fraud detection for person-to-person mobile payments. So say you go away for a weekend with a friend, you come back, you wanna split the tab, and you just wanna make a payment directly to the other person. You do this through your phone. Those transactions, that account-to-account transfer, gets checked for possible fraudulent activity in the moment, as it happens. And there's a scoring algorithm that sniffs those transactions and gives it a score to indicate whether or not it may be fraudulent or if it's legitimate. Well, this company, they use the information they capture about whether their algorithm captured, caught all of the fraudulent transactions, or missed some. And whether that algorithm mistakenly flag legitimate transactions as fraudulent, they capture all of those false positives and false negatives, feed it back into the system, and improve the performance of the algorithm for the next go-around. Here's why this matters. The data created by that algorithm about its own performance is a proprietary asset. It is unique, and no other data will substitute for it. And in that way, it becomes the basis for a sustainable competitive advantage. It's a great example. So the algorithm maybe is free. You can grab an algorithm, it's how you apply it, that is proprietary. And now, okay, so we've established that data is not fungible, but digital data doesn't necessarily have high asset specificity, do you agree with that? In other words, I can use data in different ways if it's digital. Yeah, absolutely, as a matter of fact, this is one of the other characteristics of data. It is non-rivalrous, this is what economists would call it. And this means that two parties can use the same piece of data at the same time. So, which is not the case with say, a tractor. One guy on a tractor means that none of the other people can ride that tractor. Data's not like that. So data can be put to multiple uses simultaneously. And what becomes very interesting is that different uses of data can command different prices. There's actually a project going on right now where Harvard Law School is scanning and digitizing the entire collection of U.S. case law. Now this is the law. The law that we all as Americans are bound to. And yet it is locked up in a way and just in all of these 43,000 books. Well, Harvard and a startup called Ravel Law, they are working on scanning and digitizing this data, which can then be searched for free. All of these you can search, this entire body of case law for free. So you can go in and search privacy, for example, and see all of the judgments that mention privacy over the entire history of U.S. case law. But if you want, for example, to analyze how different judges, current sitting judges rule on cases related to privacy, well that's a service that you would pay for from Ravel. Exact same data. They're algorithms working on the same body of data. You can search it for free, but the analysis that you might want on that same data, you can only get for a fee. So different uses of data can command different prices. So some excellent examples there. What are the implications of all this for competitive strategies? What should companies, how should they apply this for competitive strategies? Well, when we think about competitive strategy with data capital, we think in terms of the three principles of data capital, that's what we call them. The first one is that data comes from activity. The second one is data tends to make more data. And the third is that platforms tend to win. So these three principles, if we just run through them in their turn, the first one, data comes from activity. This means that in order to capture data, your company has to be part of the activity that produces it at the time that activity happens. And the competitive strategy implication here is that if your company is not part of that activity, when it happens, your chance to capture its data is lost forever. And so this means that interactions with customers are critical targets to digitize and data-fy before the competition gets in there and shuts you up. The second principle, data tends to make more data. This is what we were talking about with algorithms. Analytics are great. They're very important. Analytics provide information to people so that they can make better choices. But the real action is in algorithms. And here is where you're feeding your unique stock of data capital to algorithms that not only act on that data, but create data about their own performance that then improve their future performance. And that data capital flywheel becomes a competitive advantage. It's very hard to catch. The third principle is that platforms tend to win. So platforms are common in information-intensive industries. We see them with a credit card, for example. We see them in financial services. A credit card is a payment platform between consumers on the one side, merchants on the other. And a video game console is a platform between developers on the one side and gamers on the other. The thing about platform competition is that it tends to lead to a winner-take-all outcome. Not always, but that's how it tends to go. And with the digitization and datafication of more activities, platform competition is coming for industries that have never seen it before. So platform beats product, but it's winner-take-all. Number two maybe breaks even. That tends to be witness. And number three loses money. Okay, and the first point you were making about, you've got to be there when the transaction occurs. You've got to show up. The second one's interesting. Data tends to make more data. So, and you talked about algorithms and improving and fine-tuning that feedback loop. I would imagine customers are challenged in terms of investments. Do they spend money on acquiring more data? Or do they spend money on improving their algorithms? I mean, the answer has got to do both, but budgets are limited. How are customers dealing with that challenge? Well, prioritization becomes really critical here. So not all data is created equal, but it's very difficult to know which data will be more valuable in the future. However, there are ways to improve your guess. And one of the best ways is to go after data that your competition could get as well. So this is data that comes from activities with customers. Data from activities with suppliers, with partners. Those are all places where the competition could also try to digitize and data-fy those activities. So companies should really look outside their own four walls. But the next part, figuring out, what do you do with it? This is where companies really need to take a page out of actual science as they approach data science. And science is all about argument. It's all about experimentation, testing, and keeping the hypotheses that are proven and discarding the ones that are disproven. So what this means is that companies need a data lab environment where they can cut the time, the cost, the effort of forming and testing new hypotheses, getting new answers to new questions from their data. Okay, so data has value. You've got to prioritize. How do you actually value the data so that I can prioritize and figure out what I should be focusing on in the lab and in production? Yeah, well, the basic answer is to go where the money is. So there are a couple of things you can do with data. One is that you can improve your operational effectiveness. And so here you should go look at your big cost areas and focus your limited data science and minigerial resources on trying to figure out, hey, can we become more efficient in whatever your big cost driver is? If it's shipping and logistics, if it's inventory management, if it's customer acquisition, if it's marketing and advertising. So that's one way to go. The next big thing that you can do with data is try to create a new product or service, a new create new value in a way that generates revenue. So here there is a little caveat, which is that companies may also want to consider creating new capabilities, maybe enriching the customer experience, making connections across multiple channels that they can't actually charge for, not today. But what they get is data that no one else has. What they get from, let's say, making an investment to bring together the in-store shopping experience with targeted emails, with communication through social feeds and through Twitter. Let's say that they invest in trying to tie that data together to get a richer picture of their consumer's behavior. They might not be able to charge for that today, but they may get insight into the way that shopping experience works that no one else can see, which then leads to a value added service tomorrow. And I know it all sounds very speculative, but this is basically the nature of prototyping, of new product creation. Well, Uber is overused as an example, but this is a good application of Uber because essentially they pay for driver acquisition, which doesn't scale well, but they get data. That's right. Because they're there at the point of the transaction and the activity, and they've got data that nobody else has. Yeah, yeah, that's exactly right. And one of the ways to think about that is that you're like a blackjack player, counting cards. And every time you play a hand as a company, you get data, information, that may help you improve your future bets. This is why Vegas kicks out card counters because it's an advantage for the future. But what we're talking about here in digitizing activity with customers, every time you capture data about your interaction with those customers, you gain something simply for having carried out that activity. And so thinking about back to value for a minute, I mean, I can envision some kind of value flow methodology where you assess the data intensity of the activity and then assign some kind of score or value to that activity. And then you can then look at that in relation to other activities. Is that a viable approach? It absolutely is. What companies need here is a new way to measure how much data they've got, how much they use, and then ascribe value created by that data. So how much they've got, we can think about this, we always, we talk in terms of gigabytes and petabytes, but really we need some finer measurements. Data is an observation about something in the real world. And so companies should start to think about measuring their data in terms of observations, in terms of attribute value pairs. So even thinking about the record captured per activity, that's not enough. Companies should start thinking in terms of how many columns are in that record, how many attributes are captured in these observations we make from that activity. The next issue, how much do they use? Well, now companies need to look at how many of these observations are being touched, are being tapped by queries, whether they're automatically generated, whether they are generated ad hoc by some data scientist rooting around for some new understanding. So there's a set of questions there about, okay, well, what percentage of these observations we possess, are we actually using in queries of some kind? And then the third piece, how much value do we create from it? This is where, this is a tough one, and it's really an estimation. It's most likely what we need here is a new method for attributing the profitability of a particular business unit to its use of that data. And I realize this is an estimation, but there's a precedent for this in brand valuation. This is the coin of the realm when you're talking about putting a value to intangible assets. Well, as long as you're consistently applying that methodology across your portfolio, then at least you've got a relative measure, and you get back to prioritization, which is a key factor here. Is there an underlying technical architecture that has to be in place to take advantage of all this data capital momentum? There is, there is. Companies are moving toward a hybrid cloud big data architecture. What does that mean? It means that almost all the buzzwords are used and we're gonna need new ones. No, what it means is that companies are going to find themselves in a situation where some of their computing activities, storage, processing, application execution, analytics, some of those activities will take place in a public cloud environment. Some of it will take place within their own data centers, reconfigured to act as private clouds. And there are lots of potential reasons for this. There could be companies have to deal with not only existing regulations, which sometimes will prevent them from putting data up into a cloud, but they're also gonna have to deal with regulatory arbitrage. Maybe the regulations will change, or maybe they've got agreements with partners that are embodied in service level agreements that again require them to keep the data under their own observation. Even in that case, even in that case, the business still wants to consume all of those computing resources inside the data center as if they were services. But the business doesn't care where they come from. And so this is one of the things that Oracle is providing as an architecture for Oracle Public Cloud and Private Cloud in the data center that is the same on both sides of the wire. And in fact, can even be purchased in the same way. So that even this Oracle Cloud customer, these machines, they are purchased on a subscription basis, just as public cloud capabilities are. And the reason this is good is because it allows IT leaders to provide to the business computing capabilities, storage capabilities as needed that can be consumed as services regardless of where they come from. Yeah, so you've got the data locality issue, which is speed of light problems. You don't want to move data. Then you've got compliance and governance. And you're saying that hybrid approach allows you to have the take and eat it too, essentially. Are there other sort of benefits to taking this approach? Well, one of the other pieces that we should talk about here is the big data aspect. And really what that means is that relational Hadoop, NoSQL, graph database repositories, they're all peers. They're all peers now. And this is Oracle's perspective. And as I'm sure you know, Oracle makes a relational database. It's very popular. I've heard that. Yeah, we've been doing it for a while. I'm pretty good at it. And Oracle's perspective on the future of data management is that Hadoop, NoSQL, graph relational, all of these methods of data management will be peers and act together in a single, high performance enterprise system. And here's why. The reason is that as our customers digitize and datify more of their activities, more of the world, they're creating data that's born in shapes and formats that don't necessarily lend themselves to a relational representation. It's more convenient to hold them in a Hadoop file system. It's more convenient to hold them in just a great big key value store like NoSQL. And yet they would like to use these data sources as if they were in the same system and not really have to worry about where they are. And we see this with telecom providers who wanna combine call data records with customer data in the data warehouse. We see it with financial services companies who want to do a similar thing of combining research with portfolio investments, records of what their high net worth customers have invested with transaction data from the equities markets. So we see this polyglot future, it's the future of all of these different data management technologies and their applications and analytics built on top working together and existing in this hybrid cloud environment. So that's different than the historical Oracle, at least perceived messaging, right? A lot of people believe that Oracle sees it's Oracle database as a hammer and every opportunity is a nail. You're telling a completely different story now. Well, it turns out there are many nails. So, you know, the hammer is still a good thing, but it turns out that, you know, there are also Brad's and Taks and Phillips and Flathead Screwdrivers too. And this is just one of the consequences of our customers creating more kinds of data. Images, audio, JSON, XML, you know, spectrographic images from drones that are analyzing how much green is in a photograph because that indicates the chlorophyll contact. And we know that our customer's ability to compete is based on how they create value from data capital. And so Oracle is in the business of making the things that make data more valuable. And we want to reinvent enterprise computing as a set of services that are easier to buy and use. And SQL is the lowest common denominator there because of the skill sets that are available. Is that right? Well, it's funny. It's not necessarily a lowest common denominator. It turns out it's just incredibly useful. So, SQL is not just a technology standard. It's actually, in a manner of speaking, it's sort of a thinking standard. SQL is based on literally hundreds of years of hard of thinking about how to think straight. You can trace SQL back to predicate logic, which was one of the critical ideas in the renaissance of mathematics and logic in the 1800s. So, SQL embodies this way to think logically, to think about the attributes of things and their values and to reason about them in an automated fashion. And that is not going away. That, in fact, is going to become more powerful, more useful. And that's why you're wired to the way of thinking is what you're saying. That's exactly right. If you want to improve your operational effectiveness as a company, you're going to have to standardize some of your procedures and automate them. And that means you're going to standardize the information component of those activities so you can automate them better. And you're going to want to ask questions about how's it going. And SQL is incredibly useful for doing that. So, we went way over our time. This is a very interesting discussion. But I have to ask you, what is it you do at Oracle? Do you work with customers to help them understand data strategies and catalyze new thinking? What's your day-to-day like? Yeah, I do a lot of this, a lot of telling the story. Because we're going to have a huge time of change. Every 20 years or so, the IT industry goes through an architectural shift. And that changes not just the technologies used to create value from data, but it changes the very value created from data itself. It changes what you can do with information. So, I spend a lot of time explaining these ideas of data capital and sitting down with executives at our customers, helping them understand how to look out at the world and see the data that is not there yet and what that means for the way that they compete. And then we talk through the competitive strategies that follow from that. And the technical architecture required to execute those strategies. Excellent, well, Paul, thanks very much for sharing your knowledge with our CUBE audience and coming into the SiliconANGLE Media studios here, Malboro. Well, that's my pleasure, thanks for having me. All right, you're welcome. Okay, thanks for watching everybody. This is theCUBE, SiliconANGLE Media's special on the ground production. We'll see you next time.