 Jeff Frick here with theCUBE. We are on the ground in San Francisco, California at the Data Science Summit. Kind of a small show in the Mary M. Marquis downtown San Francisco. We wanted to come up, get a feel for what's going on, and we're joining this segment by George Gilbert who's doing all the interviews. So, George, what do you think of the show? I thought it was actually surprisingly mature and sophisticated, considering how early we are in the machine learning life cycle. Of course, we had a lot of vendors and advanced practitioners here. For the last several years, the sort of term of art was data lake. And that's really just the jumping off point for machine learning, where there's a natural place for it to fit next to a classic data warehouse. And in the data lake, you really put all your raw data. You don't refine it. You put it there. The data scientists work with it and say, okay, in all its sort of raw form, they say this is what this means. This is how this data set relates to that data set. And the data warehouse, of course, is that curated collection for historical performance reporting. So very different purposes. Yeah, we were pretty lucky. We had Comcast on. We had Bosch on. So we had some practitioners on. And then of course, we had a couple of vendors on. But Georgina, we've been busy. We've been running. We were at Hadoop Summit. We were at Spark Summit. How does this all fit kind of within the whole big data world as well as visualization and all those bits and pieces and components? I think this was the tip of the spear in terms of the sophistication of the problems these customers are solving. And the vendors who are here are the ones helping those customers solve those problems. So what we're seeing here today is probably what we'll see at Hadoop Summit several years from now. Spark Summit is also a little further ahead of the Hadoop Summit because they're attacking pretty sophisticated problems. The interesting thing was when you ask about the customer journey, it's pretty universal. Where everyone says, we start with an inventory of the data. You can't solve the problem unless you have the data first. And then once you have that as an inventory, you have to go to the business guys and get some sort of ranked order of the problems they want to solve. And the reason you go to the business guys is it's not always the hardest problems that are the highest value problems. They might say it's something simple that has a lot of value and you have to match can I solve that problem with the data I have on hand. It's an interesting point because I sat in on the Comcast keynote and one of the big use cases they were talking about is just serving up the right, a suggested movies for you as you sit and you're going through your on demand and really it was all about optimizing for cash and speed of delivery of what graphics are going to show and do they hit it right. So the first blush, that doesn't seem like a very important problem. Clearly it is, they're investing a lot behind that. The core is it helps the consumer consume more Comcast content which is what their business is all about. And so it might not be a very difficult problem and I hesitate to rush to judgment on that but it's a very important problem. Not difficult in perception of like a really hairy gnarly math problem but difficult in terms of optimization for all the things. And he said as the number of titles grows and they add more data to the algorithm it is a big hairy problem and then he gets into this caching optimization which do you choose to cash or not cash within your millions and millions of choices. Right and a similar problem is we see in the music world now in the past when we downloaded music we had maybe a few thousand songs of our own. Now when you stream music you have 30 million songs in the dialogue the user interface and the technology behind the UI has to change because you know you can't navigate 30 million songs it has to help you figure out what's right for you and that's much more of a machine learning problem than it is a storage and navigation problem. So really if the machine learning is happening well it's almost unseen right, it's almost invisible. Yes. Of course I think somebody said at one of our shows you know if everything's working well it's magic right and if it doesn't work well it's creepy. Right. One thing that's also worth mentioning is that a lot of customers and vendors are gravitating towards Spark one of its probably distinguishing characteristic is that all the elements of Spark, the machine learning, the streaming, the graph processing they're all tightly integrated so it's much easier to build very powerful solutions but then you ask okay so does that mean Hadoop's less relevant and the answer is not necessarily because Hadoop takes care of the resource management in terms of yarn so that you don't have different applications stepping on each other it takes care of the storage at least today in HDFS and then it does things like it manages the ingestion of data like with Kafka and Scoop so it's not like an either or problem and many of us like to pitch it as a war between the two but for now it's coexistence. Okay so it's a small show it's pretty new what are some of the mile markers you're looking for what are some of the next things that people should be looking for in terms of adoption? I guess I would like to see the skill sets that are now centered among the very sophisticated customers and the vendors who can transfer those skills to other customers start to diffuse more widely and I think what we'll see is more self-service tools from vendors like Databricks or others in the Spark ecosystem where what used to be usable as a single vendor I'm sorry a single machine tool can now work across a cluster because making it work at scale was a very difficult problem making it work on a single machine was easier now if we can bring the two together we can have ease of use and scale. Excellent well I think that's a great last word George so thanks for coming out and spending some time and talking to the data scientists he's George Gilbert with Wikibon I'm Jeff Rick with theCUBE thanks for watching from San Francisco.