 theCUBE presents On the Ground. Hello and welcome to special On the Ground, CUBE coverage here at Oracle headquarters. I'm John Furrier, the host of theCUBE here with Chris Linsky, the vice president, product management for Oracle Big Data. Welcome to On the Ground. Good to see you. Thanks John, nice to meet you. So let's talk about Big Data and the concepts going on now for analytics. What is going on in your mind around Big Data? And some of the ideas with customers are kicking around because the number one thing we hear is I got to store the data, let's all check database, system of record. But now other databases are popping up, different types of databases, you've got graph database, you've got unstructured databases. And do I run Oracle for all those? When do I use Oracle? When do I don't use Oracle? So the first question is, what are some of the obstacles that are facing the companies? Is it integration? Is it the choice? What's going on? There's a lot. I mean, there's a lot of interest in the market around Big Data. But in terms of companies that are actually using that in kind of a productized fashion to build competitive insight are less than you would think because of some of these obstacles. So we look at it in a few different ways and we try to tackle the obstacles at Oracle in each of these categories. I mean, one of the first big questions to solve is what you raised. How do I manage the data? I've got a lot of gravity in my data warehouse and in my databases, but now I've got all this new content coming in. It might be social media, it might be log data, things that you're not sure of the value. So it may not make sense to store in that enterprise data warehouse. That's really where customers are looking at alternative technologies like Big Data, like Hadoop to give you both that cost savings, but also to give you kind of specialized access, whether you're doing, like you said, spatial queries or graph queries. Oracle can give you the right engine for the right job, but what's also important in that data management layer is doing it in a way that breeds kind of simplicity of ownership. If the cost of ownership is too expensive, no one's gonna do that. So we also have an initiative called Big Data SQL that lets you use that common Oracle database as your front end, but then query back to Hadoop, query back to a spatial or graph engine. You can leave that data there where it makes the most sense. I mean, SQL on Hadoop, for instance, has proven that SQL is the language of most people querying. So like, that's out there, so that's done. But it doesn't mean you run relational databases all the time, but that's what people are interfacing into other databases. Yeah. Is that a pretext to what's really happening? Is that interfacing to other data sets is really the more important than actually having whole new systems? Because that seems to be... It's a bit of both. The way I look at it is, some companies look at Hadoop as just another data source, right? I've got some blog data, some social data. Let me put it in a place that's cost effective to store. And they are using your database as a front end, makes sense. Other customers look at Hadoop and Big Data more as a data platform where they wanna use that cluster, that compute environment to do more than just query things and build a chart. And that's where you see some new technologies coming out. In Oracle, we call it our data factory. That's around how can I use all of that compute power to actually do data integration, right? How can I keep up with that, one hour of ETL window I'm given a night to deal with all these new sources? So we see people adopting Hadoop for a decrease. That's a tough window, one hour's a tough window, if you're Wall Street backing up. Talk about data lab. What is this concept that you've been kicking around called data lab? What does that mean? So I think that's the third pillar. We talked about data management, giving you the right engine. We talked about data factory, kind of giving you that integration capability. But why go through all that effort if not to start driving innovation? And that's what we think about as the data lab. It's a place where you can experiment with advanced analytics. It's a place where you can experiment with data mashup and new data combinations. And you do it in a cost effective way in a way that breeds this notion of agility. You mentioned the word system of record before. That's a very great description for the warehouse. You're not going to change your revenue definition or your customer dimension in the warehouse. That's what everyone uses. But Hadoop, people look at it as a system of innovation. It sits alongside the warehouse. You can put a lot of that same data in there. Often you'll put data that never made it into the warehouse. So you get that big data variety and then you can use that to come up with new ideas. So that's really the essence of the lab is bringing in more data sets, trying more combinations of data and then also seeing if you can move beyond just descriptive and diagnostic analytics into predictive. That's a big area. Let me say this right. Factory is all the ingestion. Data labs, you're kind of like your, I'll say sandbox, my word. So system of record is the most important data. That's a customer name, a key variable that's in the company's business model. So that's where all the hardcore data is. Social media data might be, hey, I'm a geo, it's a geo data in a retail store, says I'm going to buy something or I'm, has local presence, has my name, which is in the system of record. So that data is in a different database, has to go over there and get to the center area. That's hard. Man, that's actually a hard problem. It is. I mean, but that's a realistic thing that people want to take is this gestural data pieces, small data, that means something to the system of record or some engagement data cross connected to system of record. Do you guys solve that problem? This is kind of what people want to look for, right? I mean, we do. I mean, what's interesting is that's an age old problem. I mean, we had that with data warehousing, we have it even more now with all the big data sources. And I think the opportunity here is to decide who should solve that problem. Is it a scarce ETL developer that you have an IT? They have limited cycles. That's true. Do I have a data scientist? People actually use data scientists to do this sort of data integration work. It's hard to come up with a new predictive model if the data sets don't match up. And it's unfortunate because that's the PhD guy. And that's menial labor to a large degree. It's hard to find PhDs too. It is, I like to call them unicorns. You hear about them, you never really see them. And you definitely don't want the scientist doing that menial labor. The joke we say is that the data scientist has been turned into a data janitor because of all these tasks that could put on their shoulders. So we think at Oracle, that's an opportunity. With this combination of data management, data factory and data lab on top, you can actually push that work out to your business analyst teams. They can collaborate with IT. They can collaborate with your data scientist if you have them. But the spirit of the lab is not... So making the analysts and the business folks make them like data scientists. Exactly. As functional as data scientists without having them being... One of the phrases in the industry is citizen data scientist. And I manage a product called Oracle Big Data Discovery. And that is really our goal. Can we build these very intuitive UIs that make these analysts produce more output like a data scientist? So what's the architecture to make that happen? Because I think that's right on the money. I think that's a great solution. And again, the example I use is just a small piece of data. But that's a database problem. So by abstracting out to another level with software, you can let people put... Why are their own solutions together? I get that. How do you guys do that from an architectural standpoint? What do you say to customers? How do I do this? What's the playbook? It's a good question because at its core, there's no reason to go about solving this problem unless it works at the big data scale. If you can't analyze petabytes, terabytes of content, you would use a regular BI solution. There's no reason to move over to big data. So a key aspect of the architecture is scale. But also, if you're gonna support these analysts, they're not happy if they click on the screen and then they wait five minutes for something to come back. So interactivity performance is critical too for this user base. Because of that, in products like BDD, and really across a lot of our different initiatives, Apache Spark has become a key piece of our architecture. And that's something you might not expect from Oracle that we're moving into open source, adopting a lot of those technologies. But we really do see the value of Spark. So I asked Neil Mendon, just today's question, where he sees the market going. So I want to ask you a little bit different question, but same question on different tasks. What's the next big thing? Because we are on the front end of this really pioneering analytics mindset. Horizontally scalable data sets, software value propositions apply to data's currency, if you will, or soon data will be in the balance sheet. Some say certainly the analysts at Wikibon are saying that some day it should be an asset. Data capital is a phrase for you to say. Data capital, love that. And so that is a trend, that's rhetoric, well it could be rather out of the corner. But that's where it's going. What's the next big thing to get us there? I think the first hurdle was just making sense of big data. It took organizations a couple years just to get their head around that and to build that architecture. So it will scale and people will adopt the system. I think the opportunity now is, at least as we see it in our analytic portfolio, is you've got these users on the system, you've got these Hadoop clusters in place, what can you do with that power? And we think the big opportunity, especially as we create these data scientists, these citizen data scientists, is machine learning. How can we embed, especially the Spark machine learning libraries into our products more natively, such that you don't have to have the PhD at the outset. You can use that compute power and you can use the Spark open source libraries to help bootstrap that process. So do you guys solve what I call the data swamp problem and because I'm going to explain it in more color. Most people are dumping everything when they call a data lake and just store all the data, we'll get to it later. Some of it's mostly it's Hadoop, it's a bunch of batch data because they don't know what to do with it yet. Right. So it just sits there and it gets dirty and it turns into a swamp. And that's what the joke is, data swamp. Ironically, we're looking at the lake here at the Oracle headquarters. Pristine. Pristine, the water's flying up through the thing. It's beautiful. This is a big problem because data that's idle that's not being used in this case that not being intelligently acted upon can get turned into a swamp is only available when needed. Meaning if something's happening in real time, you go to the data lake and pull out a piece of data to your earlier reference and make it in real time, it's important. So you never know the potential energy of that data and value. It could be perfectly useless one minute, extremely valuable the next. Is your value purpose with the big data appliance of the analysts supposed to connect to those lakes and bring them back? Is that the whole, you guys save the data lake problem? Yeah, there's two pieces. One is giving you the infrastructure. And for that, we have our big data cloud service or big data appliance because lots of people think big data is just commodity hardware. As you move into analytics and do more in-memory, you're going to want that extra capacity. So that's one piece, making sure you've got the horsepower. But then you need those tools on top, right? And that's where our big data discovery product focuses. And to your point, what we've done is actually integrate the things that those analysts need when they're in that discovery moment. First thing they need, like you said, I never knew I needed this data set before. It just came up to me. So we give you almost a shopping experience for data. You can go in, type in keywords. I want to look for social media log data. And we actually search into Hadoop and index all that content. So it's just like you were on a website. So you're kind of keeping the leak like moving and clean so you're indexing it. So you can service data at any given time. That's the first piece. The second piece though is again in your discovery process, you have to recognize this is the first time people will be working with this data. And that's where a lot of these data scientists shine because they know all the techniques as to how do I interrogate it, what's important, what's not. And that's what we build into our product now. So the analysts can just look at a very visual screen and it helps them figure out where to focus. Is it worth me spending time? It's like almost this bot craze is going on. You guys are abstracting away the scientists' knowledge into software and providing almost the interface. That's the hope. I mean, if you can get a data scientist, trust me, keep them. They're very valuable. Catch that unicorn. Yes. It's true though, but it's not enough PhDs or data scientists out there. Soon there's new curriculum out there, but still, I mean, the idea is to scale up and make the normal person, the citizen, be the scientist. And also, I mean, it's funny, if you look at the advanced analytic tools and the data science tools out there, they're very dated. A lot of them were built 15, 20 years ago with that data miner statistician. There's now this new breed of data scientists that they want more compelling interfaces, right? They expect more. Chris, final questions. Top three conversations that you have with customers where they're most challenged. You had to look at the patterns, applying all the big data techniques in your brain to the three top problems that customers are trying to solve that you guys helped. Excellent. So the first one I would say by far, and I wish it wasn't the case, but it's helped me justify building out my big data cluster. That's the first one. Lots of companies want to do more with big data, but they're struggling with the ROI. The ROI, the cost, really. Why should I make that investment? How do I justify it? And I really do think that cloud is going to change that picture dramatically. When I can shift to looking at the CAPEX versus OPEX. So you're saying the cloud lowers the bar in terms of getting value generated? It does two things. It lowers the financial entry point and how much you have to justify upfront. And it lowers the IT skill set to manage those clusters in the data center. So two very big problems. Great, that's awesome. Second one. Now I've solved that. Second one is, okay, well, what do I do next? How do I find things? Really, where should I be looking? And that is where this data lab concept is meant to come into play. Some customers will have a perfect use case in mind. That's how they justified the project. They can go and execute that. But a lot of them, again, it's this notion of a data lake. I need to pursue a range of experiments. Where do I start? And tools like Big Data Discovery help a lot there. So data lab is just going to play with the data and get a feel for it. Yep. And do it in a way that breeds that experimentation. Not just visualize the data, but change it, reshape it, build new models, build new classifications. The last thing I'd say is, okay. Did I get my ROI? Do I have a cluster? Yes. Did I figure out something that looks interesting? Yes. Now I have an idea. What do I do next? It's how do I connect my insights from Big Data back to the tools that we need to do? So this is where the value of the data capital thing you're talking about. The lab is essentially formulating the key connections for data pipes to connect in. Is that kind of the best way to think about it? Yeah, you come up with new ideas, new data products. So you've operationalized it by the third step. Yes. And then how do you do that? In some cases, it's all, I just push the data, I move the data over to my data warehouse, which may make sense. But Oracle also has, I think I mentioned before, Big Data SQL as a product, which will let you keep that data in Hadoop, keep everything else in your data warehouse. And productization is that easy. So you don't have to worry about moving data. It helps a lot. Well that highlights one of the things we always hear all the time, which is skills. Yeah. And people know SQL. They do. Everyone does. Everyone does. Chris, thanks so much for spending the time here on the ground. Really appreciate chatting with you. This is theCUBE. Exclusive coverage on the ground at Oracle headquarters. I'm John Furrier. Thanks for watching.