 Live from Union Square in the heart of San Francisco. It's theCUBE, covering Spark Summit 2016, brought to you by Databricks and IBM. Now here are your hosts, John Walls and George Gilbert. And welcome back to Spark Summit 2016 here on theCUBE. Our coverage continues here at the Hilton, just outside the Expo Pavilion here. What is still a very, very busy show floor. Really, again, just a tremendous expression of the support and the enthusiasm behind the Spark and what's happening right now in the enterprise here in 2016. I'm John Walls along with George Gilbert and we're joined now by Christopher Wynn, who is the co-founder and CEO of ARIMO, which is a company that's really focused on intelligence augmentation in the enterprise. And I'm really curious Christopher to hear about intelligence augmentation. But first, welcome. Nice to have you here. Well, thank you, John. We appreciate that. Tell us first off, I'm curious, just your thoughts. You've been here keynotes this morning, a lot of excitement around that jam-packed auditorium. Just your thoughts about seeing this kind of, I don't know, enthusiasm, if you will, around Spark and what's happening in this community right now? Well, the perspective I have is as a very early adopter of Spark. I remember when we were sitting at an Amplab room, just basically at the time when we said we were going to vet the company on Spark, there was no, it wasn't even an Apache project yet. And basically, there was this Yahoo Conviva, which is John Stowe's company, and at that time, us. And to see that the first Spark conference that we did was in December 2013, the whole program fit on not even an 8.5 to 11 sheet. There was like maybe 10, 20 presenters and we were one of them. And to see it grow so rapidly to the scale that it has today, I believe Microsoft has now gotten on board with supporting Spark and their cloud solution. It's just amazing. Yeah, and you were saying that, I mean, you kind of bet the ranch early, right? I mean, which was really a pretty bold step. And of course, your company focusing on big data, machine learning, back by Andreessen Horowitz, but making that commitment, you know, back when I said back in the project phase even, what was it that you saw or that you had a sense about what was about to happen that you put all your chips on the table? Well, if you are familiar with technology evolution and then you understand architecture and you have a sense of timing, then it's actually not a very risky bet. You know, what's risky for one person may not be as risky for the other if you have information asymmetry. And the information asymmetry that we had looking at Spark is that we looked at a whole bunch of different compute, big compute architectures, specifically in memory. And we know, you know, I used to spend time at Google looking from the outside, looking at solutions that people are coming up like Hive, which is trying to put a SQL layer on top of MapReduce that we knew that that was the incorrect approach. And so we were going to build a layer of in-memory processing. And we knew that what had to happen is that you need to have what's called a distributed dataset that exists outside of the compute cycle. If you think about the MapReduce paradigm, you only have data existing when there is some compute going on. Spark is unique in that it has a concept of RDD. And RDDs exist whether there's compute cycle or not. And what we wanted to do is to build an interactive application on top of such a compute system. And you can't do that without having this distributed dataset. So that's one of the key things about the Spark architecture that we knew was necessary. The other part is really about timing. If you draw a chart, Moore's Law about cost of memory. Over the last 50 years, it's exactly, it's actually halving LV 18 months. And so if you think about places like Google and even earlier Wall Street, when people were using a lot of in-memory in HBC high-performance computers, it was worth it because every stock tick was millions of dollars on the line. It does take time for memory to drop to a level that's cheap enough for enterprises to say, okay, now we can deploy it at scale. And so Spark came on the scene just at the right time for that to happen. And a lot of news being made here by various companies, you had some announcements as well. I want to shed a little light on that for us and with our audience as well. What kind of news you're making today? Well, I'm here for a couple of reasons. Number one is that we have a paper we're presenting tomorrow. And then I'm also on the Spark Technology Council, so for the meetings to help set some of the direction for that. The paper we're presenting is fairly esoteric. It's in the data science track. The title is Bayesian optimization for automatic hyper-parameter optimization. And so what? What impact that for us? Is that about helping to fight, if I saw some of your notes, it's helping to automatically find features for a model? Exactly. So if you think about the life of a data scientist, essentially the task is to build predictive model from data. And so when you're a data scientist and you do that, a lot of times you run the data through various algorithms and you have to tweak, there's dials on these algorithms that are called hyper-parameters. For example, the learning rate, you've got to pick that to get a good model. Now, for the most part, when a data scientist does this, it's pretty much a manual process. It's a lot of guesswork based on experience. And so it turns out that takes a lot of time. The training itself is run by large-scale computing system like Spark, but the choice of her parameters rather than waiting one cycle and trying the next one, that, if there's a way to automate it, then that could save a lot of time and effort. This is, I mean, all of this is rocket science, but this is like rocket science taking off from the moon because you're turning the machine learning process onto the machine learning process itself. That's right, that's right. And the idea that we're going to augment and make more productive, the scarce asset called data scientists. But help us relate that back to the product you're working on if that is, in fact, related. It absolutely is related. Our overarching goal at Arrimo is something we call intelligence augmentation. And what that means is that we augment human intelligence with machine intelligence and enterprise. So what we just talked about is one example of intelligence augmentation. That is, the data scientist is human intelligence and the data scientist would never go away, just like accountants didn't go away because of Excel. But what we're enabling is for the data scientist skills and time to be leveraged so that he or she can focus on the higher end task while the very tedious task of things like parameter, you know, hyper parameter searching should be done through other machine learning algorithms. So now help tie that back to the product you're starting to roll out or to unveil. Right. So our product suite has two major human targets. One is the business user, the other is the data scientist. So the value proposition that we bring to these two groups, there are three. For the business user, we allow the business user to go to a web interface, a document, we call it a narrative, and type in a natural language question like forecast the next 30 days of revenues and have the answer come back within 10 seconds. And so in order to answer that question, you need a lot of predictive models and the computing power and the data processing required to do that. For the data scientist, as we've just talked about, for example, we provide ways to make the data scientist a lot more productive. And we also work on deep learning algorithms so that there are new ways that the data scientist can use to, for example, something I'm very excited about is time series processing. I'll come back to that. The third value proposition that we give to the human, the business user and the data scientist is the collaboration between these two. So there should not be silos of software that the business user uses BI and the data scientist uses esoteric, Python or R. The value to the enterprise is expressed or is most leveraged when these two have a common environment and express their unique skills. So the data scientist can work in the same environment and type R or SQL or Python while the business user is working with natural language and they're working on the same data set. That's the key value that we provide. So you're talking about intelligence augmentation, so IA, and then we have AI, artificial intelligence. I mean, what's the distinction between the two and how, it's not like there's an overlap there, basically, right? We're using one to kind of create the other. There's sort of an interesting philosophical debate, right? Is AI going to become super human intelligent and take over the planet? And you've got people like Elon Musk saying that this is raising the devil. The great good theory. That's right. Now, we're all going to turn into a group. I am actually of the school of thought where I think the concern is warranted, right? Seeing the trends of where we're going, I think it's not wise to ignore the trend. On the other hand, to ignore the trend. So it is very important to take it seriously. On the other hand, I think that as a species, we are smart enough to apply intelligence augmentation. That we will augment our own human intelligence with machine intelligence. And there are going to be three phases of that. So the first phase is everything that we just talked about, right? We're using machine learning, deep learning algorithm to extend human abilities. In the second phase, we're going to do this, something what we call pervasive data science, meaning every API, every input box, everything in the enterprise will be machine learning able far more than what we're seeing today in terms of specific applications. Would it be fair to say the whole design time tool chain would be automated by the runtime tool chain? Yes, I think certainly automation of that using machine learning is a key. Even more significantly, take the example of an input box. Like, let's say you have a mobile application and you type in and it's asking for your first name or your last name. Today, the software behind processing that input is actually very dumb, right? It may do some validation and the validation to make sure that you've typed in a name and not some random sequence of characters that may do what's called SQL injection attack, right? You might type in some escape sequence and then the command to delete the database in the back end. And so some programmer somewhere has to write a rule that says, watch out for a particular pattern and if it's not, it doesn't fit that pattern then reject the input. If you think about it in terms of common sense, five years from now, 10 years from now, we'll ask the question, why are machines so dumb? Why can't they just see that it's clearly not a first name? And it's because they don't have common sense. And the reason they don't have common sense is because there's no learning behind it. And so imagine the day when everything behind every API, every microservice, there's a machine learning algorithm running that. There's a brain running it. So who's using IA now? How's it being applied? And what do you see are the short-term and the long-term benefits? Wish you've kind of touched on a little bit of that. So our customers range from retailers to manufacturers to financial institutions. In the case of retail, for example, there are a number of use cases that we've deployed. One is internal and the other is external. The internal is essentially the enterprise user. The external is that they are putting our software to essentially new product and new product features. For the internal case, it is a marketing manager sitting down in front of a remote and saying, forecast the right level of inventory I should put into this particular store at that particular time using all of the data available. For the external use case, we're helping looking at a sequence of customer interactions as a time series data and cluster or segment customers based on observations of what they do as opposed to who they are. If you think about customer segmentation today, it's very much we look at what we call cross-sectional data or CRM data based on high age, religion, income, and so on. But it turns out watching what people do as a time series of their actions is far more predictive of what they're about to do next. So it's behavioral instead of demographic? Exactly, behavioral as opposed to sort of cross-sectional characteristics. Can you apply that maybe on e-commerce, but the healthcare looking somehow mapping out those behaviors and trying to cite where there might be issues or better outcomes, things like that? Yes, certainly in healthcare, I guess what you say reminds me of the insurance industry like Medicare payments and so on and fraud detection. And fraud detection can also benefit significantly from this time series analysis using deep learning. That is rather than issuing a whole bunch of rules that say when payments cross certain thresholds and watch out for fraud, but looking at a sequence of submissions and say this actually is out of the norm and having the deep learning algorithm that we bring automatically learn these patterns and then learn what's normal and what's not normal. That's very like the new cybersecurity software that's entity resolution or user behavior or entity behavior analysis where they look for not one item of activity on the network, but a sequence of activities. That's exactly right. In psychology, we know that longitudinal studies are much more harder to do, but much more valuable than cross sectional studies, right? Well, we're talking about IA, I hope that we don't reach a day or you can get two hosts and run them out of the business. So hopefully we can stop a little bit short of that, but it is a fascinating world and we really appreciate your taking the time to share with us the paper and best of luck with that and certainly best of luck down the road with all your business endeavors. Thank you for having me here. Christopher, thank you very much. Good to see you. George and I'll be back with more here from the Hilton in San Francisco at Spark Summit 2016 right after this here on theCUBE.