 Live from San Francisco, it's theCUBE, covering Spark Summit 2017, brought to you by Databricks. Welcome to theCUBE, my name is David Goad and I'm your host and we are here at Spark Day Two. It's the Spark Summit and I'm flanked by a couple of consultants here from, sorry, analysts from Wikibon. Wait, I got to get this straight. To my left we have Jim Kabilis who is our lead analyst for Data Science. Jim, welcome to the show. Thanks, David. And we also have George Gilbert who is the lead analyst for Data Science. Big data, big data in analytics. I'll get this right eventually. So why don't we start with Jim? Jim, just kicking off the show here today, wanted to get some preliminary thoughts before we really jump into the rest of the day. What are the big themes that we're going to hear about? Yeah, today is the Enterprise Day at Spark Summit. So Spark for the Enterprise. Yesterday was focused on Spark, the evolution and extension of Spark to support more native development of deep learning as well as speeding up Spark to support sub millisecond latencies. But today is all about Spark and the Enterprise, really what I call wrapping DevOps around Spark, making it more productionizable, supportable. The Databricks serverless announcement, though it was announced yesterday the press release went up, they're going into some depth right now in the keynote about serverless. And really serverless is all about providing an in-cloud Spark, essentially a sandbox for teams of developers to scale up and scale out enough resources to do the modeling, the training, the deployment, the iteration, the evaluation of Spark jobs, in essentially a 24 by seven multi-tenant, fully supported environment. So it's really about driving this continuous Spark development and iteration process into a 24 by seven model in the Enterprise, which is really what's happening is that data scientists, Spark developers are becoming an operational function that businesses are building strategic infrastructure around things like recommendation engines and e-commerce environment. Absolutely demand 24 by seven resilient Spark team-based collaboration environments, which is really what the serverless announcements are all about. There's increasing demand on mission critical problems, so that optimization is a big deal. Yeah, data science is not just an R&D function, it's an operational IT function as well. So that's what it's all about. Awesome, let's go to George. I saw you watching the keynote, I think you're still watching it again this morning. So taking notes feverishly, what were some of the things that stuck out to you from the keynote speakers this morning? There are some things that are going to sort of lead over from yesterday where we can explore some more. We're going to have on the show the chief architect, Ronald Chin, and the CEO, Ali Ghodsi. And some of the things that we'll want to understand is how the scope of applications that are appropriate for Spark are expanding, we got sort of unofficial guidance yesterday that just because Spark doesn't handle key value stores or databases all that tightly right now, that doesn't mean it won't in the future. On the Apache Spark side, through better APIs, and on the Databricks side, perhaps custom integration. And the significance of that is that you can open up a whole class of operational apps, apps that run your business, and that now incorporate rich analytics as well. Another thing that we'll want to be asking about is keying off what Jim was saying, now that this becomes not a managed service where you just take the labor that the end customer was applying to get the thing running, but it's now automated and you don't even know the infrastructure, we'll want to know what does that mean for the edge, where we're doing analytics close to internet of things and people, and sort of if there has to be a new configuration of Spark to work with that. And then of course, what do we do about the whole data science process and the DevOps for data science when you have machine learning distributed across the cloud and Edge and on-prem? In fact, I know we have pepper data coming on right after this, Ali Munchi, who might be able to talk about that exact DevOps in terms of performance optimization in a distributed Spark environment, yeah. And George wanted to follow up with that, we had Matt Fryer from Hotels.com, he's going to be on our show later, he was on the keynote stage this morning. He talked about going all cloud, all Spark, and how data science is even a competitive advantage for Hotels.com. What do you want to dig into when we get him on the show? That's a really good question because if you look at business strategy, you don't really build a sustainable advantage just by doing one thing better than everyone else, that's easier to pick off. The sustainable strategic advantages come from not just doing one thing better than everyone else, but many things and then orchestrating their improvement over time. And I'd like to dig into how they're going to do that. Because remember, like Hotels.com is the internet equivalent descendant of the original travel reservation systems, which did confer competitive advantage on the early architects and deployers of that technology. Great, and pepper data, I wanted to come back, we're going to have them on the show here in just a moment. What would you like to learn from them? What do you think will benefit the community the most? Actually, it's a key, something George said, I'd like to get a sense for how you optimize Spark deployments in a radically distributed IoT edge environment. Whether they've got any plans or what their thoughts are in terms of the challenges there. As more of the intelligence gets pushed to the edge, much of that will be in machine learning and deep learning models built into Spark. What are the challenges there? I mean, if you've got thousands to millions of endpoints that are all autonomous and intelligent and they're all running Spark, just what are the orchestration requirements? What are the resource management requirements? How do you monitor and do end an environment like that and optimize the passing of data and the transfer of sort of the control flow or orchestration across all those disparate points? Okay, so 30 seconds now, what do you think the audience, why should the audience tune in to our show today? What are they going to get? I think what they're going to get is a really good sense for how the emerging best practices for optimizing Spark in our distributed fog environment out to the edge. Where not just the edge devices, but everything, all nodes will incorporate machine learning and deep learning. They'll get a sense for what's being done today, what the tooling is to enable DevOps in that kind of an environment. As well as the sort of the emerging best practices for compressing more of these algorithms and the data itself, as well as doing training in a radically federated environment. I'm hoping to hear some from some of the vendors who are on the show today. Fantastic. And George, closing thoughts in the opening segment. 30 seconds. Closing thoughts of the opening segment. I, like Jim, we want to think about Spark holistically and it's traditionally been best positioned as sort of this, as Mattay acknowledged yesterday, sort of this offline branch of analytics that you applied to data lake sort of repository that you accumulated. And now, we want to see it put into production, but to do that, you need more than just what Spark is today. You need basically a database or key value kind of option so that you're storing your work as it goes along so you can go back and analyze it, either simple analysis or complex analysis. So I want to hear about that. I want to hear about their plans for IoT. Spark is kind of a heavyweight environment so you're probably not going to put it in the boot of your car, at least not likely anytime soon. Intelligent Edge. I mean, Microsoft had built a few weeks ago was really deep on Intelligent Edge. HP, who were doing their show actually, I think it's in Vegas, right, to discover. They're also a big on Intelligent Edge, because we had somebody on the show yesterday from HP going into some depth on that. I want to hear what Databricks has to say on that theme. Yeah, and which part of the edge? Is it the gateway? The edge gateway, which is really a slim down server, or the edge device, which could be a 32 meg RAM network card. Yeah. All right, gentlemen, I'd appreciate the little insight here before we get started today. We're just getting started. Thank you both for being on the show. And thank you for watching theCUBE. We'll be back in a little while with our CEO from Databricks. Thanks for watching.