 Live from San Francisco, it's theCUBE, covering Spark Summit 2017, brought to you by Databricks. Welcome back to theCUBE at Spark Summit 2017. I don't know about you, George, I'm having a great time learning from all of our attendees. We've been absorbing now for almost two days. Yeah, well, we're about to absorb a little bit more here too, because the next guest, I was looking forward to, I saw his name on the schedule. All right, that's the guy who talks about herding cats. It's John Kavanaugh, master architect from HP. John, welcome to the show. Great, thanks to be here. Well, I did see, I don't know what it's about, cats and the internet, but either cats or self-driving cars, one of the two, in analogies. But talk to us a little bit about your session. Why did you call it herding cats? And is that related to maybe the organization at HP? Yeah, there's a lot of organizational dynamics as part of our migration to Spark. HP is a very distributed organization, and it has had a lot of distributed autonomy. So, you know, trying to get a centralized activity is often a little challenging. You know, you guys have often heard, you know, hi, I'm from the government, I'm here to help. That's often the kind of, you know, shields up response you will get from folks. So, we had a lot of dynamics in terms of trying to bring these distributed organizations on board to a new common platform, and allaying many of the fears that they had with making any kind of a change. So, are you centered in a specific division? So, yes, I'm in the print platforms and future technology group. You know, there's two large, you know, kind of business segments with HP. There's our personal systems group that produces everything from phones to business PCs to, you know, high-end gaming. But I'm in the printing group, and that, you know, while many people are very familiar with your standard desktop printer, you know, the printers we sell really vary from, you know, a very small product, we call Sprocket, it fits in your hand, battery operated, to literally a web press that's bigger than your house, prints at hundreds of feet per minute. So, it's a very wide product line, and we have a lot of data collection. Is it 3D printing as well? We do have 3D printing as well. That's an emergent area for us. I'm not super familiar with that. I'm, you know, mostly on the 2D side. Sure. But that's a very exciting space as well. So, tell me about what kind of projects that you're working on that do require that kind of cross-team or cross-departmental cooperation? So, you know, in my talk, I kind of talked about the Wild West era of Big Data, and this was, you know, prior to 2015, and we had a lot of groups that were standing up all kinds of different Big Data infrastructures. And part of the stems from the fact that, you know, we were part of HP at the time, and we could buy servers and racks of servers at cost, right? Storage was cheap, all these things, so they sprouted up everywhere. And around 2015, everybody started realizing, oh my God, this is completely fragmented. How do we pull things back together? And that's when a lot of groups started trying to develop platform-ish types of activities. And that's where we knew we needed to go, but there was even some disagreement from different groups, how do we move forward? So there's been a lot of good work within HP in terms of creating a virtual community, and Spark really kind of caught on pretty quickly. Many people were really tired of kind of Hadoop. There were a lot of kind of very opinionated models in Hadoop where Spark opens up a lot more into the data science community. So that kind of went really well, and we made a big push into AWS for much of our cloud activities. And, you know, we really ended up then pretty quickly with Databricks as an enterprise partner for us. And so George, you've done a lot of research. I'm sure you've talked to enterprise companies along the way. Is this a common issue with big enterprises? Well, you know, for most of Big Data projects, they started, the ones we hear a lot about is there's a mandate from like the CIO that we need a big data, you know, strategy. And so someone will, in the past, stand up, you know, a five or 10 node Hadoop cluster and run some sort of pilot and say, you know, this is our strategy. But it sounds like you heard it. A lot of cats were crowned. We had dozens of those small Hadoop clusters all around the company. So how did you go about converting that energy, that sort of like that excess energy towards something more harmonized around Databricks? Well, a lot of people started kind of recognizing we had a problem, right? This really wasn't going to scale. We really needed to come up with a broader way to share things across the organization. So the timing was really right. And, you know, a lot of people were beginning to understand that. And, you know, we set forth probably about five different kind of key decisions we ended up making. And part of the whole strategy was to empower the businesses, right? Like I mentioned, we have a very distributed organization. So you can't really dictate the business. The businesses really need to own their success. And one of the decisions that was made, it might be kind of controversial for many CIOs, is that we made a big push on cloud-hosted and business-owned, not IT-owned. And one of the real big reasons for that is we were no longer viewing data and big data as kind of a business intelligence activity or a standardized reporting activity. We really knew that to be successful moving forward this needed to be built into our products and services. And those products and services are managed by the businesses. So it can't be something that would be, you know, tossed off to an IT organization. So did the IT organization then evolve into being more of an innovative entity versus a reactive or supportive entity for all those different distributed groups? Well, in kind of in our regard, you know, we've ended up with AWS as part of our activity. And really much of our big data activities are driven by the businesses. The connections we have with IT are more related to CRM and product data master sheets and those selling and channels and all of that information. But if you take a bunch of business-led projects and then try and centralize some aspect of them, wouldn't IT typically become the sort of shared infrastructure architecture advisor for that and then the businesses then now have a harmonized platform on which, you know, they can build shared data sets? Actually, in our case, that was what we did. We had a lot of our businesses that already had significant services hosted in AWS. So this became, and those were very much part of the high data generators. So it became a very natural evolution to just continue with some of our AWS relationships and continue on to Databricks. So kind of as an organization today, we have kind of three, kind of main buckets for our Databricks. But, you know, any business, you know, they can get their accounts. We try to encourage everything to get into a data lake in S3 in parquet formats, one of the decisions that was adopted. And then from there, people can just begin to move. You know, you can get notebooks, you can share notebooks, you know, the beauty of Databricks and AWS is instant on, you know? If I want to play around with something with a half a dozen nodes, it's great. If I need a thousand for a workload, boom, I've got it. I don't, kind of, other than, you know, what this costs and the value of return, there's really no need for permissions or coordination with other entities. And that's kind of what we wanted the businesses to have that autonomy to drive their business success. But does there not need to be some central value added in the way of, say, data curation through a catalog or something like that? Yeah, so this is not necessarily a model where all the businesses are kind of doing all kinds of crazy things. One of the things that was shepherded by one of our CTOs and fielded folks is we ended up creating a virtual community within HP. This kind of started off with a lot of, quote, kind of tribal elders or tribal leaders. But this virtual community today is, you know, we get together every two weeks and we have presentations and discussions on all things from, you know, data science and to machine learning. And that's where a lot of this, you know, activity around how do we get better at sharing? And this has fostered, you know, kind of splinters off for additional activities. We have one on data telemetry within our criminal organization. We're trying to standardize on more data formats and schemas for those so we can have more broader sharing. So these things have been occurring more organically as part of developer enablement kind of moving up rather than more of kind of dictates moving down. When this interesting, potentially really important, when you say you're trying to standardize some of the telemetry, what are you instrumenting? Is it just all the infrastructure or is it some of the products that HP makes? It's definitely the products and the software. You know, like I said, we manage this, you know, huge spectrum of print products and my apologies if I'm focusing on it, but that is what I know the best. You know, we've actually been doing telemetry and analysis since, you know, the late 90s. You know, we wanted to understand, you know, you know, use supplies and usage so we could do our own forecasting. And that's really, really grown over the years. You know, now, you know, we have parts of our services organization, Manage Print Services, where they're offering, you know, big data analytics as part of the package. We provide information about predictive failure parts. And that's been really valuable for some of our business partners, you know, that allows them, we have all kinds of fancy algorithms that we work on, you know, the customers have specific routes that they go for servicing and you may be able to tell them, hey, in a certain time period, we think these devices in your field so you can coordinate your route to hit those on an efficient route rather than having to make a single truck roll for one repair. And you do that before customer experience is a problem. So it's been kind of a great example of different ways that big data can impact the business. Yeah, I think Ali mentioned in the keynote this morning about the example of a customer getting just even notification that their ink's going to run out. And the chance that you get to, you know, touch that customer and get them to respond and buy could make millions of dollars difference, right? Let's talk about some of the business outcomes and the impact that some of your work has done and what it means really to the business. There's, you know, right now we're trying to migrate a lot of legacy stuff and, you know, that's kind of boring. It's just a lot of work, but there are things that need to happen, but there's really the power of the big data platform has been really great with Databricks. I know John Landry, one of our CTOs, he's in the personal systems group. He had a great example on some problems they had with batteries and laptops. And, you know, they have a whole bunch of analytics. They've been monitoring batteries and they found a collection of batteries that were experienced very early failure rates, right? Happened to be able to narrow down to specific lots from a specific supplier and they were able to reach out to customers to get those batteries replaced before they died. So mini recall instead of a massive PR failure. And, you know, it was really focused on, you know, customers didn't even know they were going to have a problem with these batteries that they were going to, you know, die early. You know, you got to them ahead of time, you told them we knew this was going to be a problem and try to help them. I mean, what a great experience for a customer. That's just great. So once you had this telemetry and it sounds like a bunch of shared repositories, not one intergalactic one, what were some of the other use cases like the, you know, like the battery predictive failure type scenarios? So, you know, we have some very large gaps or not gaps, but different categories. We have clearly consumer products. You know, you sell millions and millions of those and we have a little bit of telemetry with those. You know, things we want to understand failures and ink levels and some of these other things. But on our, you know, commercial web presses, these very large devices, these are very sensitive. You know, customers, you know, these things are down, they have a big problem. So these things are generating all kinds of data, right? We have systems on premise with customers that are alerting them to potential failures and there's more and more activity going on there to understand predictive failure and predictive kind of tolerance slippages. I'm not super familiar with that business, but I know some guys that, you know, they've started introducing more sensors into products specifically so they can get more data to understand things. You know, very slight variations in tensioning and paper. You know, these things that are running hundreds of feet per minute can have a large impact. So, you know, I think that's really where we see more and more of the value coming from is being able to return that value back to the customer, not just help us make better decisions, but to get that back to the customer, right? You know, we're talking about expanding more customer-facing analytics in these cases or we'll expose to customers some of the raw data and they can build their own dashboards. Some of these industries have traditionally been very analog, so this move to digital web presses and this mountain of data is a little new for them, but, you know, HP can bring a lot to the table in terms of our experience in computing and big data to help them with their businesses. All right, great stuff, and we just got about a minute to go before we're done, so I have two questions for you. The first is an easy yes-no question. Okay. Is Purdue going to repeat his Big Ten champion basketball? You know, I don't know. We talked about it. I hope so. I'm more focused on the Warriors winning. All right, go Warriors. And the real question is, what surprised you most? This was your first Spark Summit. What surprised you most about the event? So, you know, you see a lot of, you know, kind of internet-born companies and it's amazing how many people have just gone fully native with Spark all over the place. And it's a beautiful thing to see, you know, in larger enterprises, that transition doesn't happen like that. I'm kind of jealous. We have a lot more things to slug through, but it's the excitement here and all the things that people are working on. You know, you can only see so many tracks. I'm going to have to spend two days when I get back, just watching the videos on all the tracks I couldn't attend. All right, internet-born companies versus a big enterprise. Good luck herding those cats and we appreciate you sharing the story with us today and talking a little bit about the culture there at HP. Thank you very much. And thank you all for watching this segment of theCUBE. Stay with us. We're covering Spark Summit 2017. This is day two and we're not done yet. We'll see you in a few minutes.