 Live from Boston, Massachusetts. It's theCUBE, covering Spark Summit East 2017, brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. Welcome back to Boston, everybody. It's snowing like crazy outside. It's cold midwinter day here in Boston, but we're here with theCUBE, the worldwide leader in live tech coverage. We are live covering Spark Summit. This is wall-to-wall coverage. It's our second day here. John Landry is with us. He's the Distinguished Technologist for HP's Personal Systems Data Science Group within Hewlett-Packer. John, welcome to theCUBE. Thank you very much for having me here. So I was saying, I was joking. We do a lot of shows with HPE. It's nice to have HP back on theCUBE. It's been a while, so. But I want to start there. The company split up just over a year ago, and it seemingly been successful for both sides, but you were describing to us you've gone through an IT transformation of sorts within HP. Describe that. Yeah, well, we, in the past, we were basically a data warehouse type approach. We're reporting in what have you coming out of data warehouses using Vertica. But recently, we've made an investment into more of a programming platform for analytics. And so, where transformation to the cloud is about that where we're basically, instead of investing into our own data centers, because we really, with the split, our data centers went with Hewlett-Packard Enterprise, is that we're building our software platform in the cloud. And that software platform includes analytics. And in this case, we're building big data on top of Spark. And so that transformation is huge for us, but it's also enabled us to move a lot faster of velocity of our business and be able to match up to that better. And like I said, it's mainly around the software development, really, more than anything else. Describe your role in a little bit more detail inside of HP. My role is I'm the leader in our big data investments. And so I've been leading teams internally and also collaborating across HP with our print group. And what we've done is we've managed to put together a strategy around our cloud-based solution to that. And one of the things that was important that we had a common platform, because when you put a program platform in place, if it's not common, then we can't collaborate. Our investment could be fractured. We have a lot of silo efforts going on and what have you. So my role is to provide the leadership and the direction for that. And also, one reason I'm here today is to get involved with the Spark community because our investment is in the Spark. And so that's another part of my role is to get involved with the industry and to be able to connect with the experts in the industry so that we can leverage off of that. Because we don't have that expertise internally. What are the strategic and tactical objectives of your analytics initiatives? Is it to get better predictive maintenance on your devices? Is it to create new services for customers? Can you describe that? It's two-fold, internal and external. So internally, we got millions of dollars of opportunity to better our products, void cost, also to optimize our business models. And the reason, and the way we can do that is by using the data that comes back from our products, our services, our customers, combining that together and creating models around that, that are then automated and can be turned into apps that can be used internally by our organizations, right? The second part is to take that same approach, same data, but apply that back towards our customers. And so with the split, our enterprise services group also went with Hewlett Packard Enterprise. And so now we have a dedicated effort towards creating managed services for the commercial environment. And so that's both on the print side and on the personal system side. And so to basically fuel that, analytics is a big part of the story. And so we've had different things that you'll see out there like Touchpoint Manager is one of our services we're delivering in personal systems. What is that? Touchpoint Manager is aimed at providing management services for SMB and for commercial environments. And so for instance, in Touchpoint Manager, we can provide a predictive type of capability for support. We can, a number of different services that companies are looking for when they buy our products. Another thing that we're going after too is devices as a service. So there's another thing that we've announced recently that basically we're invested into there. And so this is obviously if you're delivering devices as a service, you want to do that as optimal as possible. Well, being able to understand the devices, what's happening with them, be able to predictive support on them, been able to optimize the usage of those devices, that's all important, right? A lot of data. Well, the data really helps us out, right? So the data that we can collect back from our devices and to be able to take that and turn that around into applications, that are delivering information inside or outside is huge for us, huge opportunity. Are there, it's interesting where you talk about internal initiatives and managed services, which sound like they're mostly external. But on the internal ones, you were talking about taking customer data and internal data, turning those into live models. Can you elaborate on that? Sure, and give me a good example is, on our mobile products, they all have batteries, right? All of our batteries are instrumented as smart batteries and that's an industry standard, but HP goes actually a step further on that is the information that we put into our batteries. So by monitoring those batteries and the usage in the field, is we can tell how optimally they're performing, but also how they're being used and how we can better design batteries going forward. So in addition, we can actually provide information also back into our supply chain. For instance, because there's a cell supplier for the battery, there's a pack supplier, there's a unit manufacturer for the product, and so a lot of things that we've been able to uncover is that we can go improve process. And so improving process alone helps to improve the quality of what we deliver and the quality of the experience to our customers. So that's one example of just using the data, turning that around into a model. Is there an advantage to having such high volume, such market share in getting, not just more data, but sort of more of the bell curve so you get the edge conditions. Absolutely, it's really interesting because when we started out on this, everybody is used to doing reporting, which is absolute numbers and how much did you ship and all that kind of stuff. We're doing big data, right? So in big data, you just need a good sample population, right? Turn the data scientist into that and they got the statistical algorithms against that. They give you the confidence factor based upon the data that you have. So it's absolutely a good factor for us because we don't have to see all the platforms out there. And then the other thing is when you look at populations, well, we see variances in different customers, right? So if we're looking at, like one of our populations that's very valuable to us is our own, right? So we take the 60,000 units that we have internally at HP and that's one of our sample populations. Right. And what a better way to get information on your own products. But you take that and you take it to one of our other customers and their population's going to look slightly different. Why? Because they use the products differently. And so one of the things is just usage of the products, how they, you know, the environment they're used in, how they use them, you know, all of that is our sample populations are great in that respect. And of course, the other thing is that, you know, just very important to point out is that we only collect data under the rules and regulations that are out there, right? So we absolutely follow that and we absolutely keep our data secure and absolutely keep everything, you know, and that's important. I think people get sometimes in these, you know, today they get a little bit spooked sometimes around that, but, you know, the case is, is that our services are provided based upon, you know, customers signing up for them. So. I'm guessing you don't collect more data than Google. No, we're nowhere near Google. So if you're not spooked at Google. No, no, that's why I tell people, I says, you know, I says, you know, if you got a smartphone, I says, you're giving up a lot more data than we're collecting. Buy some of it from Amazon. Spark, where's Spark fit into all this? Spark is great because we needed a program and platform that could scale. And in our data centers and our previous approaches, we didn't have a program platform. We started with Hadoop. Hadoop was very complex though. I mean, it really gets down to the hardware and you're distributing, you know, you're, you know, programming and trying to distribute that load and getting the clusters. And you pick up Spark and you immediately abstraction. The other thing is it allows our, you know, to me to hire people that can actually program on top of it, I don't have to get someone that knows MapReduce, right? I can sit there and, you know, okay, what do you know, Spark, you know, I mean, sorry, you know, R, you know, Scala, you know, Python, it doesn't matter. I can run all of that on top of it. So that's huge for us. The other thing is just to flat out the speed because as you start getting going into this, you know, we get this pull all of a sudden. It's like, well, I only need the data like once a month. And it's like, I need it once a week. I need it once a day. I need the output of this by the hour now. Yeah, it's like, and so the scale and the speed of that is huge. And then when you put that on the cloud platform, you know, Spark on a cloud platform like Amazon, now I've got access to all the compute instances. I can scale that, I can optimize it because I don't always need all the power. I mean, it's just the flexibility of Spark and being able to deliver that is huge for our success. So I got to ask some Colombo questions in George. Maybe you can help me sort of frame it. So you mentioned you were using Hadoop and like a lot of early Hadoop practitioners, you found it very complex. Now, Hewlett Packard has resources. Many companies don't. But so you mentioned, you know, people out doing Python and R and Scala and MapReduce. Are you basically saying, okay, we're going to unify portions of our Hadoop complexity with Spark and that's going to simplify our efforts? No, what we actually did was we started on the Hadoop side of it. The first thing we did was we tried to move from a data warehouse to more of a data lake approach or a repository and that was internal, right? And that was a cost reduction exercise. That was a cost reduction, but also data accessibility. Yeah, okay, right. The other thing we did was ingest in the data, when you're starting to bring data in from millions of devices, we had a problem coming through the, you know, a firewall type approach and you got to have something in front of that, like a Kafka or, you know, something in front of it that can handle it, right? So when we moved to the cloud, we just, we didn't even try to put up our own. We just used Kinesis. And that we didn't have to spend any resources to go solve that problem, right? Well, the next thing was is that when we got the data, you need to ingest the data in and we need, you know, our data's coming in. We want to split it out. We needed to clean it and what have you. We actually started out with, you know, running Java and then we ran Java on top of Hadoop. And then, but then we immediately saw, we came across Spark and we said, that's it. You know, for us to go to the next step of actually really getting into Hadoop, we were going to have to get some more skills and to find the skills to actually program in Hadoop was going to be complex. And to train them organically was going to be complex. We got a lot of smart people, but- You got a lot of stuff to do too. Right, but then- That's the thing, we want to spend more time on getting the information out of the data as opposed to, you know, the framework of, you know, getting it to run and everything, right? So. Okay, so there's a lot of questions coming out of that, but you mentioned Kinesis and is that still, you've replaced that with a- Yeah, when we went to the cloud, we immediately started using, we use as many Amazon services as we can as opposed to growing something for ourselves, right? So when we get onto Amazon, you know, getting data into an S3 bucket through Kinesis was a no-brainer. I mean, we immediately, when we transferred over to the cloud, it took us less than 30 days to point our devices at Kinesis and we had all of our data flowing into S3. So it was like, that was like, wow. Now let's go do something else. So I got to ask you something else. Again, I love when practitioners come on. So one of the complaints that I hear sometimes from AWS users, and I wonder, you see this is that the data pipeline is getting more and more complex. I got an API for Kinesis, one for S3, one for DynamoDB, one for ElasticBlas, there must be 15 proprietary APIs that are primitive, but, and it gets complicated. And sometimes it's hard to even figure out, what's the right cost model to use. Is that increasingly becoming more complex or is it just so much simpler than what you had before that you're in Nirvana right now? I don't know. When you mentioned cost, I mean, this is the cost of moving to the cloud as a major cost reduction for us. Reduction, yeah. Oh, wow, okay. So now it's like, now we're like, okay, now we're real. Yeah, we had that HP corporate tax on you before. Well, yeah, we're going from a data center and software licensing. So that was a big win for you. Oh, huge, very huge. And that released us up to go spend dollars on resources then to focus on the data science aspect. So when we start looking at it, we continually optimize, don't get me wrong. But the point is, is that if we can bring it up real quickly, that's going to save us a lot of money. And if you don't have to maintain it, right? So we want to focus on creating the code inside a spark that's actually doing the real work as opposed to the infrastructure. And so that cost savings was huge. Now, when you look at it over time, you know, we could have over-analyzed that and everything else, but what we did is we used a rapid prototyping approach. And then from there, we continue to optimize. So we can, what's really good about the cloud is you can predict the cost. And with internal data centers and software license and everything else, you can't predict the cost because everybody's trying to figure out who's paying for what. But in the case of the cloud, it's all, you know, pretty much you get your bill and you understand what you're paying. So anyway, that's- And then you can adjust accordingly. Yeah, so we continue to optimize. And so we use the services, but if we need, if we have for some reason, it's going to deliver us an advantage, we'll go develop it. But right now our advantage is, we got umpteen opportunities to create AI type code and applications to basically automate these services. There's just, we can't even, you know, we don't have enough resources to do it right now. But the common programming platform is going to help us. Can you drill into those, those umpteen examples, you know, like just some of them because- Well, I mentioned the battery one for instance, right? So take that across the whole system, right? So now you got your storage devices, you've got your, you know, the software that's running on there, you've got, we've got built into our systems, we have security monitoring at the firmware level. Just basically connecting to that and add an AI around that is huge, right? Because now we can see attacks that may be happening upon your fleet. We can create services out of that. You know, the whole, anything that you can automate around that to where we, you know, is money in our pocket or money in our customer's pocket, right? So if we can save them money with these new services, they're going to be more willing to come to HP for product. But it's actually more than just automation because it's the stuff you couldn't do with, you know, a thousand monkeys trying to ride Shakespeare. That's true. You know, it's like you have data that you could not get before. That's true. Yeah, you're right. What we're doing is the automation is helping us uncover things that we would have never seen. Right. And you're right. The whole gorilla walking through the, you know, the broom. Yeah. I could sit there and I could show you tons of examples of where we're missing the boat, right? And even when we brought up like our first data sets, right? We started looking at them. And some of the stuff we looked at, we thought, oh, this is just bad data. And actually it wasn't. It was bad product. Different. Yeah. And people talk about dark data. That's a great idea. Because we had no data models to it. Yeah, we had no data model to say is it good or bad, right? And now we have data models and we're continuing to create those data models around, that's where really you create the data model and the data model then you can create the, you know, continue to teach it. And that's where we create the apps around it then. And so, you know, that's our primitives basically are those data models that we're creating from the device data that we have. Are there some of these apps where some of the intelligence lives on the device? And, you know, it can like in a security attack, you know, the big surface area, you want to lock it down right away. Yep, we do. Well, the good example on the security is we build something into our products called SureStart. What essentially it is is we have a, we have the ability to monitor the firmware layer. And so there's a local process that's running, independent of everything else that's running, that's monitoring what's happening at that firmware level. Well, if there's an attack, it's going to immediately prevent the attack or recover from the attack, right? Well, that's built into the product. So, but it has to have a model of what this anomalous behavior is. Well, in our case, we're monitoring what the firmware should look like. And if we see that the firmware is no, you know, you take checksums from the firmware. Also, the firmware just does not change. Well, you basically, we can take the characteristics of the firmware and monitor it, right? If we see that changing, then we know that something's wrong. Now, we could get corrupt through, you know, hardware failure, maybe, you know, glitches can happen, right? I mean, solar flares can cause problems sometimes, right? So the point is, is that we have found that customers had problems sometimes where basically their firmware would get corrupt and they couldn't start their system. So we're like, are we getting attacked? Is this a hardware issue? Could it be bad flash devices? You know, there's always, you know, all kinds of things that could cause that. Well, now we monitor it and we know what's going on. Now, the other cool thing is that create, we create logs from that. So when those events occur, we can collect that, those logs, and we're monitoring those events. So now we can have something monitor the logs that are monitoring all the units. So if you've got millions of units out there, right? How are you going to do that manually? You can't, you got, that's where the automation comes in, right? So the logs give you the ability up in the cloud or at HP to look at the ecosystem of devices, but there is intelligence down on the... There's intelligence to protect the device, you know, auto recover, which is really cool. So in the past, you had to get your repair, right? Imagine if someone attacked your fleet of notebooks, you know, so you got 10,000 of them and basically brought every single one of them down one day. What would you do? Freak. Yeah. So basically, and everything you got to replace, in other words, there's just an attack, you know, and it could happen, right? And so we basically protect against that with our products. And at the same time, we can see that that may be a current, right? And then from the footprints of it, right, we can then do analysis on it, determine, okay, was that malicious? Is this happening because of a hardware issue? Is this happening because maybe we try to update the firmware and something happened there? What caused that to happen, right? And so that's where, you know, collecting the data from the population then helps us do that. And then mix that with other things, like service events. Are we seeing service events being driven by this? Thermal, we can look at the thermal data. Maybe there's some kind of, you know, heat issue that's causing this to happen. So that you start mixing that. Did Samsung come calling to, you know, buy this? Well, actually what's funny is it's, you know, Samsung is actually a supplier of ours too, right? Of course, yeah. As a battery supplier of ours. So by monitoring the batteries, what's really interesting is we're helping them out because we go back to them. And one of the things I'm working on is we want to create apps that can go back to them so that they can see the performance of their product that they're delivering to us. So instead of us having to call a meeting and saying, hey guys, let's talk about this. We've got some problems here. Imagine how much time that takes, right? But if they can self monitor, then they're going to want, they're going to want to keep supplying to us, right? So they're going to better their product. That's huge. I mean, wow, what a productivity boost, right? Because they're like, hey, we got a problem. Okay, let's meet and talk about it. And then, all right, then you take an action to go figure out what it is. And now if you need a meeting, it's like, let's look at the data. You don't have enough people. But there's also potentially a shift in pricing power. I would imagine it shifts a little more in your favor if you have all the data that indicates, you know, the quality of their product. That's an interesting thing. I don't know if that we reached that point. I think that in the future it would be something that could be included in the contracts. The fact that, you know, the world is the way it is today. And data is a big part of that and to where, you know, as you go forward, absolutely, you know, the fact that you have that data helps you to better have a relationship with your suppliers. Well, and your customers. I mean, you know, it used to be that. There's no debate. That the brand used to have all the information. You know, the internet obviously changed all that. But this whole digital transformation and IOT and all this log data, that sort of levels the plane bill back to the brand. It definitely changes it. I mean, you can now add value for the consumer. Right. You know, you couldn't before. And that's what it keeps trying to do is we're invested in it to exactly do that. Is to really improve or increase the value of our brand. We have a strong brand today. And what do you guys do with, we got a wrap, but what do you do with Databricks and what's the relationship there? Databricks, again, we decided that we didn't want to be the experts on managing the whole spark thing. The other part was is that, yeah, we're going to be involved with spark and help them drive the direction as far as our use cases and what have you. Databricks and spark go hand in hand, right? So, you know, they got the experts there, right? The other, and it's been huge, our relationship being able to work with these guys. But I recognize the fact that, and, you know, going back to our, you know, software development and everything else, we don't want to put resources on that. We got too many other things to do. And the less that I have to worry about my spark code running and scaling and the cost of it and being able to put code in production, the better. And so having that layer there is saving us a ton of money and resources and a ton of time. Just imagine time to market. It's just huge. Right. Right, John. Yeah. Awesome having you on. Thanks very much. Thank you very much. Great talking to you guys. All right, keep right there. We'll be back with our next guest, theCuberLive from Spark Summit East. Right back.