 live from Union Square in the heart of San Francisco. It's theCUBE, covering Spark Summit 2016, brought to you by Databricks and IBM. Now here are your hosts, John Walls and George Gilbert. And welcome back to San Francisco. Along with George Gilbert, I'm John Walls as we continue our coverage here on theCUBE of Spark Summit 2016. We're joined by a couple of guests right now, Martin Hall, who does business development for the big data solutions team at Intel. Martin, thank you for being with us. And, and Merrick Nijvitsch, who is the COO of deepsense.io. Merrick, thank you for being here as well. Thank you for the invitation. Yeah, first off, before we jump in, let's just talk about, I'd like to hear your impressions for a little more than halfway through our first day of coverage here. Let you guys make of what's happening here, what you're seeing, what you're feeling, kind of the vibe that's going on here. Martin, I'll let you lead off. Well, it's interesting because it reminds me of the first Hadoop event. So I was at the first Hadoop Summit event six, seven years ago, and I turned up at that show thinking, oh, it's just a bunch of technologists working on some open source project that clearly sparked more recently than Hadoop has caught fire. And there is innovation on both the technology and the business front that started to get reflected. So I think we're seeing something pretty significant here in the overall big data and analytics ecosystem. And Merrick, what's your sense of what you're seeing and what you're hearing here so far? It's a great opportunity to see how much is happening in the open source community right now. It's a great way to meet different companies working on Spark and in this infrastructure. It's a really great opportunity to meet people here. I'm sure a lot of people who are watching right now, very familiar with Intel, obviously, maybe not so much on DeepSense. So if you would share a little bit about what you consider to be your core competencies and what your primary focuses are. I see. So here in DeepSense, what we are doing, we are doing primarily three things. One of our first one is Seahorse. It's a product that is already integrated with the Trastic Analytics platform out of the Intel. And what Seahorse does for you, it allows you to work in visual way with Spark. So basically with simple drag and drop, you can create workflows and make your life easier. You don't have to be code savvy to work on Spark anymore. You can create models that are scalable and will run on Spark with Seahorse. The next thing we do, we provide enterprise end-to-end solution in the data science, deep learning, machine learning. And the third thing, what we are working on, we are trying to introduce this domain to people from the various of the industries. So we provide workshops for people from the different domains around machine learning, deep learning, data analytics. So you mentioned, so visual recognition being one of your primary functions, if you will. Saw some interesting things today. I thought in the keynote sessions about visual recognition and applying it, you know, through the use of Spark technology. Can you give us an example of something that may be proof of concept here, for what's worked and maybe something that's a little more relatable than to people who are watching right now? Understand. So I have one funny example. So as you know, whole industry around image recognition is very big. What we recently do, we participated in the very interesting competition on Kaggle. And the goal of this competition was to save Wales. So I know it sounds great. It's a very noble cause, Merrill, by the way. Yes, that's what we do in deep sense. Basically what happens. So right now, people from the governmental institutions are flying over the oceans, taking pictures and trying to establish where given Wales are. This is the endangered species, so they need to keep track of them, calculate the current population, make sure that it continues to grow. It's a very heavy task on people, really hard to do. There are more than 500 types of Wales to recognize. So what we did, we participate in the competitions that the goal was to recognize them throughout the deep learning capabilities. We were able to win those competition. Right now our algorithms are able to find with the 87% accuracy, which type of whale is it? Not only to find a whale, but as well to determine which type of whale is it. And those whales are sometimes submerged. They are not posing for the picture per se. So you don't have any passport photo. You can't understand still, is that what you're saying? You cannot ask them, show me your ID, right? You cannot ask for that. And that makes this problem really difficult. And this is just one use case. And you can imagine how many different areas only from animal related subjects, how many different endangered species are there that people are trying to protect. The application of those kind of algorithms is really big and we are looking forward to find the applications in the industry as well. Well, so simple question here, but it sounds so. How does it work? I mean, how does Spark make that work then that would allow you, not only through whales, but some demonstrations we saw giving context to pictures, giving relevance to static images that otherwise, through artificial intelligence, it never would happen, right? I understand. Those pictures aren't living, breathing. They're not, a machine is trying to interpret data and it's really not getting a sense of what a human, how a human would see that. Would you categorize what John's talking about as pattern recognition where the patterns are not static? Is that the modeling you're trying to do? You can consider it like that. So you can imagine that, you know, you can take a movie of the ocean and see how it's everything moves. But after all, you can sample it and then you have set of pictures. But there's a lot of those pictures, right? So that's why where we need help of the platforms like Plastic Analytics platforms to be able to scale up to such amount and then be able to perform our analytics there, training and then execution of the model. Talking about this is one of your partners, one of your many partners. Intel's made a serious commitment, you know, to the open source community and then creating this kind of collaboration. I'd like you to expand on that if you would a little bit as to, you know, what's the motivation there with Intel? What's driving that and how these kinds of partnerships are realizing that value? Well, one of the things you mentioned at the start is that, you know, people sort of understand who Intel is. I've been with Intel just under a year and I thought I understood Intel, but I clearly didn't. You know, we think Intel, we think Silicon, CPUs, powering the world's data centers, that's certainly true. But Intel has a long history being involved in open source and investing in software. And that's as true today of Big Data Analytics as it was of Linux, for example. So what we're doing in this space is we're investing in software innovation up and down the stack. So we think of the Big Data Solutions stack as really a four layer stack. You've got infrastructure, which is a combination of hardware, you know, network storage, processing capabilities, virtualization on top of that. But then you've got kind of two additional key layers that are enabling all the kinds of technologies and solutions that everybody's wanting to get their hands on as enterprise organizations today. You've got the data layer and at the data layer you've got things like Hadoop and no SQL technologies which are core in terms of sort of scale out solutions. They can distribute things over commodity servers. But we've also got an analytics layer and what Marek was talking about earlier really is how do we enable the analytics to be performed on top of those modern data platforms? Spark is a key technology there. So what Intel has done is is invested in Hadoop with our partnership with Cloud Era. We have invested significantly in a new open source project called Trusted Analytics Platform that's designed to integrate all kinds of open source projects into one environment that you can provision in private and public clouds so that data scientists and developers can more quickly, more easily, more cheaply develop applications that are powered by big data analytics. But Intel's never worked alone. We've always worked with the ecosystem so we have a particular position in the marketplace in the world at large that allows us to develop ecosystems and work with innovative vendors like DeepSense to partner and combine their kind of value add and capabilities with what we're doing on these open source platforms. I just want to take a quick tangent away from DeepSense but to come back. On Intel, Oracle has told us how they're taking some database specific accelerators, whether it's encryption or things that would be IO intensive and putting them in the chip. In their case, Spark, because they have more control. There's some things that are going into the Intel chip. Are there analytic things from the analytic stack that you're putting in the chip that accelerate things? We're looking at optimization top to bottom so we focus a lot on performance and security. Those are the two things in particular that we care about. What can we do in Silicon to enable better performance of analytic workloads and better security of analytic workloads to protect enterprise organizations? But it's not just about Silicon, it's about software as well. It's about enabling those capabilities in key open source projects like Hadoop and Spark so we invest significantly there and we've got that capability because we have engineers looking at obviously the Silicon and Silicon designs and understanding that at that level and we have software engineers looking at how do we optimize that for Silicon and then we have this network of vendors and the ecosystem that we work with. So, Mark, you've talked about this wonderful and very concrete case or use case with whales. Can you tie that back to something many of our viewers have heard about in terms of deep learning, but is an amorphous concept for many of us? Help us understand how that capability enables people to do what your contest, what you won with that Kegel contest. I see, certainly. So you can imagine that at the airport the security has to see throughout the many people that are traversing the airport and find out suspicious behavior being able to identify people and very much as in the case of the whales it is pretty much face recognition. So our goal can be used as well in the face recognition that is very well understand problem right now. There are a couple of solutions out there in the market and certainly our solutions can help with that to address this issue. Okay. The DARPA, the Defense Advanced Research Projects Agency one of their grand challenges besides the autonomous vehicle years back was how do we make surveillance more productive? And that has much more broad applicability but watching identifying and tracking whales is a perfect example. Have you seen that objective spread more broadly into industry and have you seen applications that you're going to be catering to? So I've heard about one interesting application that we as a company are particularly interested as well. So when it comes to taking some footage from the surveillance and identifying suspicious behavior it's not anymore a picture, it's a sequence of pictures. So being able to basically characterize them as a graph series of pictures and out of it filter out suspicious behavior is something that definitely military will be interested in, private security firm will be interested in. So this is as well as a part of this industry what we are looking at right now and hopefully you will see more from us about that in future. You'll just be on people and sort of face recognition if we talk about image analytics. Think about drones. Obviously we were talking about the whale use case before but drones flying over crops, there are agricultural use cases to identify what's going on with particular crops as they develop and you think in the health field medical diagnoses and basically sort of automated image recognition from for example MRIs. So there are a lot of applications we're starting to see emerge and one of the things that I got excited about with Intel was how many markets, how many organizations Intel works with and how many different use cases we get exposed to and can help with in terms of both the hardware and software solutions that we bring to market. Is there a sense that besides the internet of things but more along the lines of this deep sort of pattern, deep pattern recognition where it's not a bank transaction but it's finding meaning in a very, very rich contextual kind of whether it's image or something else. Is there a sense that we're at the cusp of a huge step function in demand for cloud-based horsepower to make? Oh for sure, that's what we're certainly seeing. You've got the general area of analytics but within that you've got things like machine learning and deep learning where you require power. Meaning you require power to simulate the capabilities of some neurons firing in our brains. And we've now got access to that power, the kind of silicon that Intel's bringing to market makes things that were possible at a price performance point that weren't possible, possibilities today. So yes. Okay. There's always, I wouldn't say tension but occasionally there's some gaps, right, between the key players. Somebody, they have their primary responsibilities and they kind of stay to themselves. So how do you bridge those gaps maybe between scientists and developers and analysts or whatever the case may be in terms of these new opportunities, these new initiatives? I mean, what's the key there to making sure that this collaboration is encouraged and that we break down whatever barriers might be existing right now within various enterprises? So I think there are two different aspects of it. One is more technical, the second is rather communication. So on the technical aspect, what is important right now that everything from the text to speech, image recognition, natural language processing is done and available through the API. So we can start building complex solutions integrating all those three. But there is still this other aspect that people have to be able to understand what is actually needed. So quite often how it is that C-level executive will not fully understand what he can do with machine learning, deep learning solutions. On the other hand, you will have a data scientist that he understand what to do with it. Moreover, he can do it but he don't know what is actually necessary or needed to generate value for enterprises. So we call it an awareness chasm and we are trying to bridge it to the various of workshops working with the companies from the very early stage of defining the problem, finding the solution, creating model till the deployment on the production services, production servers of the given company. So trying to basically handle that end-to-end. Well that's very noble. You're trying to save the CEO and save the whales at the same time. And help data scientist, right? Sorry, gentlemen. I'm sorry, go ahead Martin. Hey Kavanaik, I think you hit on the key thing which is collaboration. Teams win over individuals, right? And I think one of the themes that we're seeing within the analytics space is we've got to enable teams within businesses that historically have been siloed to work together to more rapidly capitalize on the data that's at their fingertips with collaborative analytics solutions. And in open source, open source is all about collaboration as well and then developing ecosystems of vendors that collaborate to solve problems together. How much of that collaboration is better tools than how much is a changing culture or the two working together? Oh, I think it's both for sure. And I've spent six, seven years now in the big data space and organizationally, what I've observed is companies starting to look at how do we break down boundaries? How do we basically connect what were historically silos? Data scientists have often been siloed individuals. Marek was talking about that earlier, right? We've got to connect the data scientist to the needs of the business. And that's not just words, we have to facilitate that in software and sort of platforms that make that real. Okay, so let me drill on that. Even though we're drifting a little way from deep learning, but that's because it's just a fundamental problem where in the data warehouse era, it was just IT. And if you had to negotiate with IT if you wanted one more field, now with the data lake, there's much more self-service. And I guess the question is, how do you foster, how do you empower the sort of citizen data scientist or citizen business analyst to get more out of that data and then to essentially to effect change? Not just so that they can find new things, but so they can get new things, that they can get things done. Well, I think, I mean, you hit on a couple of things. We have to enable self-service. We have to enable collaboration, but we have to do it and still recognize that we've got to protect the data. We've got to have governance in place. We've got to have security in place so that we don't give everybody access to everything, either within an organization or if you think of multi-tenant platforms as well, which is why security is so key. It's easy to go develop the capabilities. It's hard to really look at how do we make sure there's an integrated security model? How do we make sure that security model goes all the way down to silicon and takes advantage of sort of in silicon capabilities? So there's a number of aspects to this that have to be carefully considered end-to-end and top-to-bargain. Okay, so thank you for sharing the time and best of luck in the continued great work that you're doing and collaboration. I think we're seeing it personified right here. Very good example. Thank you both, we appreciate it. Thank you. And George and I will be back with more from San Francisco here on theCUBE in just a bit.