 Live from Union Square in the heart of San Francisco. It's theCUBE covering Spark Summit 2016, brought to you by Databricks and IBM. Now here are your hosts, John Walls and George Gilbert. And welcome back to the Spark Summit 2016 here on theCUBE, along with George Gilbert. I'm John Walls, and we're now joined by Ashley Stirrup, who's the CMO of Talent, which is an open source data integration company. And Ashley, we certainly appreciate the time here. Thanks for taking time out of what's a pretty busy day for you, I know here at the summit. First off, impression about what you've seen, what you've heard, what's happening here with regard to Spark and what's going on in that community. Yeah, well, I have to say, it's just such an exciting time in the data industry. I mean, what you're able to do with Spark and Hadoop and machine learning to be able to take information and turn it into new insight, better experiences for customers, recommendations on new products, what have you. It's really exciting time. Well, you were at the keynote today. We're talking about 2.0, the new release coming up here. Just your take on some of the changes, talking about structured streaming, structured APIs, a few other improvements as well. That's right, yeah, yeah. Well, first off, let me just say that Talent's been a huge supporter of Spark for a long time now. We've been working very closely with Databricks. We're one of the first integration vendors to announce native support for Spark so that you can build your data integration flows. They generate native Spark that runs on top of Hadoop and immediately got customers five X better performance, allowed them to do things in real time, one design environment for batch and real time. So we've really gone all in on Spark and it's provided a lot of value to our customers. Yeah, and so what is it about that? I mean, I hear a lot of easier, faster, simpler, you know, that was kind of this overarching mantra and being the people's head. The fact that you talk about all these various sources of data. And that continue to grow by exponential volume. Yeah. So, I mean, what is it that you then, that you see as this glue or whatever it might be that's allowing you to broaden your services to your clients then? Right, well, first off, let me just say that, you know, if you look traditionally at the world, being able to do machine learning, data science, big data has really been the domain of the largest e-commerce web type companies Facebook and Amazon, Netflix, someone like that. And so with the advent of Spark and leveraging things like machine learning, and now with talent, we're opening it up, we're making it possible for any organization without that large team of data scientists to go capture all that information, apply machine learning to it, and apply it in real business use cases. And so what we're doing is we're making Hadoop and Spark much more accessible to any Java developer. They don't need to be an expert in Spark or in MapReduce or what have you, and they can immediately start building intelligent data integration flows. We've always, we'll always hear from the beginning of time to the end of the time, the difference between choice and integration or simplicity, where do you come out on that spectrum and where would you position some of the other vendors, newer entrants or some of the old stand-bys? Right, well, I don't want to talk ill of the competition, things like that. So what I'll really focus on is that from our perspective, what we're doing is we're allowing customers to build these data integration flows, do all their design work, and do it in a way that's future-proof or platform-agnostic, because the market is moving so fast that just two years ago MapReduce was the standard and now Spark is the standard, and people talk about Flink or what have you, a lot of other technologies as other potential options, and we want to give customers the option of leveraging all that technology and doing it in a native way, but that's portable, so they can go from one environment to another. Okay, so regardless of whether it's talent or the old guard or the new guard, what are the elements of the workflow? What did we need in the data warehouse world to do our business intelligence, and what do we need now in the world of the data lake, the functions to enable all the, to democratize access to data science? Right, right. I mean, well, what I'd say is obviously connectivity is important. You've got data in so many different formats that needs to get brought together, and then you need to put it into a format that's usable for people, and so what we're really focused at talent is how do we enable the entire organization to leverage that data? So we want to make sure we're solving the data integration, serving the data integration developer, allowing them to handle all that kind of plumbing of where do I get the data from and how do I bring it all together, access different APIs, give tools for the data scientists, so once they've built their machine learning model, how do I put it into production and let IT manage the fact I want to keep it up 99.9% of the time? So the product line would include something to help the data scientists do the modeling? Yeah, so we're not a machine learning company, but we're helping our customers put those machine learning algorithms once they're built. How do you put them into production? So you would take the model and something like a standard like PMML that would say take it from a model design tool and put it into production? That's essentially right, yeah. So basically, you would have data scientists, let's say it's build a recommendation engine. Somebody puts something into their shopping cart, one of the next things you offer them and they would build that model and then how do you make sure you've deployed that in real time so you're capturing real time streaming data of web clicks and making the right recommendation and then putting the right products actually in front of the customer? And so we're automating that intelligent data flow. Okay, and then we hear now it's still a little leading edge, but the idea that people want to keep learning continuously like fraud, the patterns are always changing. So you've got almost like two flows. One is the model from drifting by keeping up with the new fraud patterns. And then the other is put that new model back into production at some frequent interval. Is that a workflow that you can accommodate now or is that something that's not really mainstream for a while? Yeah, no, I mean that's something, that's actually a real differentiator for talent. We call it continuous integration, data integration, continuous integration. And so we're allowing our customers to basically each developer to be checking in there, pieces of the data integration flow or what have you, a new machine learning algorithm, put in automated testing and allow people to basically be continuously checking in new code that allows them to much more rapidly get new versions of their data integration models out and deployed across large teams. But when you say data integration models, are you distinguishing essentially the transformation logic as opposed to the predictive modeling? Yeah, I'm actually including the whole thing. So both the data modeling access movement, as well as the machine learning algorithms and stuff. I mean in all forms, structured, unstructured streaming and all sources, something about the internet of things. And I think about all of a sudden, your clients have this whole new world opening up to them and voluminous in some respects. I mean, how's that impacting your business and what kind of abilities and capabilities do you now have via Spark to make that a little more usable if you're relative to your clients? Yeah, well, that's a great question. I mean, no question. When I actually talk to our customers, we hear, when they talk about internet of things, what they often mean is new data sources. So it could be sensor data or something that's truly. It might be a wearable, it might be in the soil, it might be wherever. But there's just so much data, whether it's social or web traffic or server logs or what have you. And all of that data, each of those data sources provides new opportunities to create insight, create a better customer experience, deliver a new product. And we're actually providing a lot of those capabilities to pull all that together. So we see a lot of those types of projects going on in the cloud. And so working with Amazon Kinesis or Kafka for your streaming and then of course, for ingestion and then using Spark streaming for the data transformation and then machine learning on top of that. And so we're allowing our customers to wire all these things together and not only handle these new real-time data sources but combine traditional data, historical data I should say with that new real-time streaming. So you've got both the historical information and the new real-time. You've got one environment that you're managing all that. And Anilson, you give it a whole new relevance, a whole new context. It allows them to evaluate that real-time data now based against 10 years, 20 years, whatever they might have. That's right. And the more you can do in real-time, the more value you're getting out of your data. I mean, data, the value of data decays so quickly. Solve problems, avoid problems, all those types. You can do something in the moment with a customer. It's 10 times more powerful than if you're following up afterwards. What about, I mean, so, I mean, give me an example, if you will, maybe, I think you mentioned, Soil. Yeah. I mean, what's, who are you working with in that regard that maybe they're doing something that they haven't been able to do before? That's right. Because you're leveraging this. Yeah, well, we're working with a customer of ours called Spring. They're out of the Netherlands, and they're helping small farmers around the world. Africa is a big focus for them by giving them better recommendations on what's a plant and what fertilizer to use. And it can have as much as a 5x improvement in the amount of crop that they're producing. And that makes the difference between them going hungry or not than being able to send their children to school. And so what they're doing is they're replacing the old model of a chemistry-based model where you had to mail samples off and wait weeks to get recommendations and they were expensive. To having somebody in a small truck with a small device they can plug into the soil, take new sensors to get readings, pass that in real time back to the data center, get a recommendation back and give that farmer the information they need at a far lower cost and far faster. And I'm curious, like, what period of time? You have any idea how quickly that transaction knowledge occurred? Minutes, yeah. So before what might've taken- A month. A month. From the hinterlands, if you will. That's right, yeah. 30 days of blight or disease or whatever, whatever environmental condition wreaked havoc, you're able to avoid that problem, if you will, in a matter of minutes and adjust your patterns. And it has real impact on real people. That's what's so exciting. Yeah, that's great. So you mentioned something that I want to key off, that real time changes so much. Yeah. What happens? What did the pipeline look like? I mean, we have a broad sense, but I want to hear your words. For the pipeline in the traditional world. Yes. You know, operational database, data warehouse. Right. To the real time world. Yeah, well I mean, in most cases you're doing something fundamentally different. So you're, in the traditional, when you're not doing things in real time, you're doing your best to profile customers based on either historical information or behind, you know, with a much smaller set of data in a much less tailored way in real time. For more of a sample and stale sample. Yeah, yeah. I mean, you're not nearly as able to do fine grain recommendations. So think of it as, I'm going to offer all males between 40 and 50, you know, the sporting goods option, right? Versus knowing in real time, while they're looking at golf. So I'm going to talk to them about this new golf ball that came out, right? You're being able to be much more targeted and provide a much better experience. And just to satisfy my curiosity, what's that topology look like? Right. So what are the products that a customer has to buy to do that? Right, right. Well, you know, it is a huge range of potential options. So I mean, what I can tell you is that if you look at our customer base over time, what they were doing was a lot of bulk and batch analytics. They were still using statistical models like SAS to do segmentations and target offers, but they just weren't able to take advantage of that detailed level information. And so now there's a whole new crop and I'm not an expert on the particular ones that are out there, there's a whole crop of real time recommendation engines that customers are able to leverage. And we're helping to work with all those, connect the data, both the historical data and the real time data, so that they can make those recommendations and then deliver them back to the customer. Okay. Historically they know George and I lose a lot of golf balls and in real time, I need to replace them. That's right. That's the history, right? Ashley, thanks for- How valuable would that be? Yeah, I would love that actually because that's a perpetual problem. Yeah, you have the FedEx guide pull up right next to you at the golf course. I played three rounds, by now you've exhausted your dozen. So, Ashley, thanks for being with us. All right, thank you. Appreciate the time and best of luck with Talon going forward here. Thank you, my pleasure. You bet. We continue with our coverage here from the Spark Summit 2016 on theCUBE, but right after this.