 Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to you by Hortonworks. Welcome back to theCUBE. We are live at the DataWorks Summit on day two. We've had a great day and a half learning a lot about the next generation of big data, machine learning, artificial intelligence. I'm Lisa Martin and my co-host is George Gilbert. We are next joined by a CUBE alumni, Ron Bodkin, the VPN General Manager of Artificial Intelligence for Teradata. Welcome back to theCUBE. Well, thank you, Lisa. It's nice to be here. Yeah, so talk to us about what you're doing right now. Your keynote is tomorrow. Yeah. What are you doing? What is Teradata doing in helping customers to be able to leverage artificial intelligence? Sure, yeah, so as you may know, I have been involved at this conference in the big data space for a long time. As a founding CEO of Think Big Analytics, when we were involved in really helping customers at the beginning of big data in the enterprise. And we're seeing a very similar trend in the space of artificial intelligence. The rapid advances in recent years and deep learning have opened up a lot of opportunity to really create value from all the data that customers have in their data ecosystems. So Teradata has a big role to play in having high quality products, Teradata database, analytic ecosystem products, such as Hadoop, such as QueryGrid for connecting these systems together. So what we're seeing is our customers are very excited by artificial intelligence, but what we're really focused on is how do they get to the value? What can they do that's going to really get results? And we bring this perspective of having a strong solutions approach inside of Teradata. So we have Think Big Analytics consulting for data science, and we've now been building up experts in deep learning in that organization, working with customers. We've got product functionality, so we're innovating around how do we keep pushing the Teradata product family forward with functionality around streaming with listener, with functionality like the ability to, how do you take GPU and start to think about how can we add that and make that deployed efficiently inside our customers data center? How can you take advantage of innovation and open source with projects like TensorFlow and Keras becoming important for our customers, right? So we're seeing, as a lot of customers are excited about use cases for artificial intelligence, and indeed tomorrow on the keynote, I'm going to touch on a few of them, right? Ranging from applications like preventative maintenance, anti-fraud and banking, to e-commerce recommendations, and we're seeing those as some of the examples of use cases where customers are saying, hey, there's a lot of value in combining traditional machine learning, wide learning, with deep learning, right? Using neural nets to generalize. Help us understand if there's an arc where there's the mix of what's repeatable and what's sort of packageable, or what's custom, how that changes over time, or whether it's just by solution. Yeah, that's a great question, right? I mean, I think there's a lot of infrastructure, that any of these systems need to rest on, right? So having data infrastructure, having quality data that you can rely on as foundational, right? And so you need to get that installed and working well as a beginning point, right? Obviously having repeatable products that manage data with SLAs and supporting, not just production use, but also how do you let data scientists analyze data in a lab and make that work well, right? So there's that foundational data layer. And then there's the whole integration of the data science into applications, which is critical, right? Analytics ops of agile ways of making it possible to take the data and build repeatable processes, right? And those are very horizontal, right? There's some variations, but those are working the same in a lot of use cases. At this stage, I'd say in deep learning, just like in machine learning generally, you still have a lot of horizontal infrastructure. Are you about Spark, you've got TensorFlow, right? Those are support use cases across many industries. But then you get to the next level, you get into specific problems and there's a lot of nuance, right? What modeling techniques are going to work? What data sets matter, right? Okay, you've got time series data and a problem like fraud. What techniques are going to make that work well? In recommendations, you may have a long tail of items that to think about recommending, right? How do you generalize across a long tail where you can't learn people who use some relatively small thing or go to an obscure website or buy an obscure product? There's not enough data to say, are they likely to buy something else or do something else? But how do you categorize them so you get statistical power and can make useful recommendations, right? Those are things that are very specific that there's a lot of repeatability in a specific solution area. This is, when you talk about the data assets that might be specific to a customer and then I guess some third party or syndicated sources, if you have an outcome in mind, but not every customer has the same inventory of data. So how do you square that circle? Yeah, that's a great question. I mean, I really think that's a lot of the opportunity in the enterprise and applying analytics, right? So this whole summit data works is about, hey, the power of your data, what you can get by collecting your data and a well-managed ecosystem and creating value, right? So there's always a nuance, right? It's like, what's happening in your customers, right? What's your business process? What's special about how you interact? What's core to your business, right? So I guess my view is that the way anybody that wants to be a winner in this new digital era and have processes that take advantage of artificial intelligence is going to have to use data as a competitive advantage and build on their unique data, right? Because we see a lot of times enterprises struggle with this where there's a tendency to say, hey, can we just buy a package off the self-sass solution and do that? And for context, for things that are the same for everybody in the industry, that's a great choice. But if you're doing that for your core differentiation in your business, you're in deep trouble in this digital era. And that's a great point. Sorry, George, really quickly. That this day and age, every company is a technology company. You mentioned a use case in banking, fraud detection, which is huge. There's tremendous value that can be gleaned from artificial intelligence. And there's also tremendous risk to them. I'm curious, maybe just kind of a generalization. Where are your customers on this journey in terms of are you going out to customers that have already embraced Hadoop and have a significant amount of data that they say, all right, we've got a lot of data here. We need to understand the context. Kind of where are customers in that maturity evolution? Sure, so I'd say that we're really fast approaching the slope of enlightenment for Hadoop, which is to say the enthusiasm of three years ago when people thought Hadoop was going to do everything have kind of waned and there's now more of an appreciation like, hey, there's a lot of value in having a data warehouse for high value curated data for large scale use. There's a lot of value in having a data lake, a fairly raw data that can be used for exploration in the data science arena. So there's an emerging, what is the best architecture for streaming and how do you drive real-time decisions, and that's still very much up in the air. So I'd say that most of our customers are somewhere in that journey. I think that a lot of them have backed off from their initial ambitions that they kind of bought a little too much of the hype around all that Hadoop might do and they're realizing what it is good for and how they really need to build a complimentary ecosystem. The other thing I think it's exciting though is I see the conversation is moving from the technology to the use cases, right? People are a lot more excited about how can we drive value in analytics and let's work backwards from the analytics value to the data that's going to support it. Absolutely. So building on that, we talk about sort of what's core and if you can't have something completely repeatable that is going to be core to your sort of sustainable advantage, but if everyone is learning from data, how does a customer achieve a competitive advantage or even sustain a competitive advantage? Is it orchestrating learning that feeds, that informs processes all across the business or is it just a perpetual sort of red queen effect? Well, it's a great question. I mean, I think there's a few things, right? There's operational excellence in every discipline, right? So having good data scientists, having the right data, collecting data, thinking about how do you get network effects? Those are all elements, right? So I would say there's a table stakes aspect that if you're not doing this, you're in trouble, but then if you are, it's like how do you optimize and lift your game and get better at it, right? So that's an important fact, right? You see companies that say, well, how do we acquire data? Like one of the things that you see digital disruptors like at Tesla doing is changing the game by saying, we're changing the way we work with our customers to get access to the data. I think of the difference between every time you buy a Tesla, you sign over the rights to collect and use all your data when the traditional auto OEMs are struggling to get access to a lot of the data because they have intermediaries that control the relationship and aren't willing to share, right? And a similar thing in other industries, right? You see in consumer packaged goods, a lot of manufacturers there are saying, well, how do we get partnerships? How do we get more accurate data? The old models of going out to the Nielsen's of the world and saying, give us aggregates and we'll pay you a lot to give us a summary report. That's not working. How do we learn directly in a digital world about our consumers so we can be more relevant, right? So one of the things is definitely that control of data and access to data as well as, you know, we see a lot of companies saying, well, also what are the acquisitions we can make? You know, what are startups and capabilities that we can plug in and compliment? To get data, to get analytic capability that we can then tailor for our needs. It's funny that you mentioned Tesla, you know, having sort of more cars on the road, collecting more data than pretty much anyone else at this point. But then there's like Stanford's sort of luminary for AI, Fei-Fei Li. She signed on I think with Toyota because she said, you know, they sell 10 million cars a year. I'm going to get, you know, I'm going to be swimming in data compared to anyone else with the possible exception of GM or maybe some Chinese manufacturer. So where does, how can you get around scale when, you know, using data at scale to inform your models? How would someone like a Tesla be able to get an end run around that? Yeah, so, I mean, that's always the battle that disruptor comes in. They're not at scale, but they maybe change the game in some way, right? Like having different terms that give them access to different kinds of data, more complete data, right? So that's sort of part of the answer, right? Is, you know, to disrupt an industry, you need a strategy what's different, right? Like in Tesla's case, of course, a little electric vehicles. And, you know, now they've been investing in autonomous vehicles with AI. Of course, everybody in the industry is seeing mad and is racing, right? I mean, Google really started that whole wave going a long time ago, right? Is another potential disruptor coming in with their own unique data asset. So, you know, I think it's all about the combination of capabilities, but you need, you know, disruptors often bring a commitment to a different business process, right? And that's a big challenge, right? As a lot of times, the hardest things are the business processes that are entrenched in existing organizations and disruptors can say, we're rethinking the way this gets done, right? I mean, the example of that in ride-sharing, right? The Uber's and Lyfts of the World, right? Deities where they are reconceiving. What does it mean to consume automobile services? Maybe you don't want to own a car at all if you're a millennial. Maybe you just want to have access to a car when you need to go somewhere, right? That's a good example of a disruptive business model change, right? What are some things that are on the intermediate term horizon that might affect how you go about trying to create a sustainable advantage? Here, I mean, things like where deep learning might help data scientists with feature engineering. So there's less need for, you can make data scientists less of a scarce resource, or where there's new types of training for models where you need less data. Those sorts of things might disrupt the practice of achieving advantage with current AI technology. That's a great question, right? So near-term, the ability to be more efficient in data science is a big deal, right? There's no surprise that there's a big talent gap, a big shortage of qualified data scientists in the enterprise, and one of the things that's exciting is that deep learning lets you get more information out of the data so it learns more so you have to do less feature engineering. It's not like a magic box, you just pour in raw data to deep learning and out comes the answers. You still need qualified data scientists, but it's a force multiplier. There's less work to do in feature engineering and therefore you get better results, right? So that's a factor, right? You're starting to see things like hyperparameter search where people will create neural networks that search for the best machine learning model, right? And again, get another level of leverage. Now, today, doing that is very expensive, right? I mean, the amount of hardware to do that, very few organizations are going to spend millions of dollars to sort of automate discovery of models, but things are moving so fast, right? I mean, even just in the last six weeks to have NVIDIA and Google both announced significant breakthroughs in hardware, right? And as a colleague forward me a paper for recent research that says, hey, this technique could produce a hundred times faster results in deep learning convergence, right? So you've got rapid advances with investment in the hardware and the software, right? So in historically, software improvements have outstripped hardware improvements throughout the history of computing, right? So it's quite reasonable to expect that you'll have 10,000 times the price performance for deep learning in five years. So things that today might cost a hundred million dollars that no one would do, you know, could cost, you know, $10,000 in five years. And suddenly it's a no-brainer to apply a technique like that to automate, you know, something instead of hiring more scarce data scientists that are hard to find. And make the data scientists more productive, right? So they're spending more time thinking about what's going on and less time trying out different variations of how do I configure this thing? Does this work, does this work, right? Gosh, Ron, we could keep chatting away. Thank you so much for stopping by theCUBE again. We wish you the best of luck in your keynote tomorrow. I think people are going to be very inspired by your passion, your energy, and also the tremendous opportunity that is really sitting right in front of us. Well, thank you, Lisa. It's a very exciting time to be in the data industry and the emergence of AI and the enterprise. I couldn't be more excited by it. Excellent, well, your excitement is palpable. We want to thank you for watching. We are live on theCUBE at the DataWorks Summit Day Two, hashtag DWS17 for my co-host George Gilbert. I'm Lisa Martin, stick around, we'll be right back.