 from New York, extracting the signal from the noise. It's theCUBE, covering Spark Summit East, brought to you by Spark Summit. Now your hosts, Dave Vellante and George Gilbert. Welcome back to Midtown Manhattan, everybody. This is theCUBE. We go out to the events, we extract the signal from the noise. Locke going on here at Spark Summit East. John Furrier and Bert Lattimore running a crowd chat. Check out crowdchat.net slash Spark Summit. George and I, George is going to be presenting his update on big data and Spark in context later on this afternoon. But right now, good friend of theCUBE, Tendu Yogurtju is here, he's the general manager of Big Data at Syncsort. Tendu, it's great to see you again. Same here, hi, George, hi, Dave. Hello, so you guys are on the ground, front lines, we saw you in October at our Big Data NYC event that we run in conjunction with Strata. We got the update there and things keep evolving. We're here at Spark, a lot of talk about real time, breaking CPU bottlenecks, bringing transaction and analytics together. We're talking off camera. Your heritage is in the mainframe business. That's where a lot of the transaction data sits. So give us the update on Syncsort, what's transpired since October and then we'll get into it. Sure, in terms of Syncsort, we got acquired by Clear Lake Capital and we are the first one in the 10 year fund and we will be also involved more acquisitions for that fund. So that's great in terms of organic as well as inorganic growth. So that's one company level update. And in terms of what's happening with the Big Data trends and adoption that we are seeing and how we are staying ahead of that adoption is really, we see more and more convergence between batch analytics and operational analytics. And this is driving streaming processing and streaming data sources and batch data sources to be on the same platform because as the enterprise and Fortune 500 companies are really trying to build their enterprise data hub and or Data Lake, whichever way they prefer to refer it, they have to really access all of the enterprise data. And if you are having Internet of Things use cases with connected devices, mobile phones, having a churn analysis for telco or fraud analysis with financial services, you have this streaming data and you also have to make sense of that data and get real time insights by referring to historical reference data. And it's often that that historical reference data is in the transactional data store and 70% of that data globally is on still mainframes. Still mainframes because of high throughput and platform availability and security. So we see that convergence, especially in the enterprise and Fortune 500 that having a single data hub and data platform for accessing all of the enterprise data. That's the challenge because it's not an easy task. It requires skill sets, understanding both the rapidly changing and evolving technology stack with Spark, Hadoop and all of the technologies and applications building on top of that. As well as understanding of the legacy platforms like mainframes. So we see in our customer base, especially that challenge, and it's an opportunity for us because we really fit very well. We have a unique value proposition with the understanding of both big data technologies and native integration with the Hadoop stack and Apache Spark, as well as best understanding of mainframe data and processing. So we see that. However, streaming data sources and batch analytics and operation analytics on the same platform is a big trend that we are seeing. So Tender, you touched on it, sort of the overriding customer objective of the problem they're trying to solve. They want to reduce churn. They want to get fraud detection before it happens or maybe in real time and they probably want to improve the false positives that they're getting. They want real time insights to customer demand preferences. Those are the sort of big problems that they're trying to solve. But when you talk about bringing batch and operational together. So connect those two. This is obviously an evolution. We've been trying to solve those problems for decades. And then of course, the big data mean attack that in new ways. Reducing sampling, allowing us to process more data, et cetera. So talk about the evolution and the connection to those business problems, if you would. Sure. In terms of the business problems, the examples I gave, let's take those use cases in terms of telecom, churn analysis or fraud detection with finance. The real time, actually, I liked your definition of real time during your interview with Ali Gadzi from Databricks because real time is such a misuse term. It means sometimes every hour to an end user and sub second to an end user on the stock exchange. So it's kind of goes from one end to the other. That's why sometimes streaming analytics is better defined for people with real time insights as you define the time before you lose the customer. So what we are seeing is that as you are bringing this device data, whether you are trying to understand the user behavior with the telco or where your user is located in a particular time which store they are going or how they are helping themselves when they encounter a problem with the communication system or whether you are doing fraud analytics, the customer data or the transactional data is on the legacy systems and it's in batch. So as you are consuming this streamed data source and you are actually, you have to react very fast and that part is the real time insights. And as you are reacting very fast, you are also making sense of that reference data, historical data very fast. So it's not just how fast you are ingesting that data, how fast you are actually applying advanced analytics on that streamed data. How fast you are also accessing the historical reference data makes a difference and you have to have that data available. That's where the enterprise data hub comes into picture because you want the availability of your enterprise data whether it is from mobile devices or it is from sensors or it is from click stream Twitter feeds or from your transactional sources. You want to be able to have all of the data available at all times. Are customers evolving their sort of batch into this sort of real time operational world or are they sort of taking their real time operational and subsuming batch or is it a combination? How are they actually doing it? I think they are evolving to the real time and it's a kind of process within most of the organizations because there are many different owners and groups and business units involved. It starts from, most of the big data applications starts from the businesses, right? We saw with the marketing ads, et cetera, initially as the evangelists of these. However, there is a lot of enterprise data held by more legacy groups and it requires that collaboration at the business unit as sometimes IT drives that as well. So we see that transformation across the organizations to the more real time and one of the most common use cases also we see really operational intelligence and operational data from those legacy platforms being actually processed in the big data analytics. For example, Spark actually has that advantage because it became very popular. It has the promise of being that single compute platform both for streaming analytics and advanced analytics with machine learning as well as batch analytics and last year we made the open source contribution to Spark packages making mainframe data available for Spark interactive queries because you may have operational intelligence data, telemetry data, security data that's on mainframes and it's very expansive to process that on mainframes. So making that data available for Spark analytics, Spark SQL interactive queries and machine learning is a big advantage that Spark can make that possible and what we contributed is making that data available and also you can actually with our engine on Spark you can process that data in its original format. So compliance and data governance issues are all addressed. You don't have to have data conversions or format conversions ability to operate on that data using Spark. So our CTO and our co-founder, David Fleuer who we call the imminent Mr. Fleuer because he's always a source of such insight talked about how as we get more mature with these types of systems that combine insight and transactions says we're integrating ever more tightly between the analytics and the transaction tightly and with lower latency. Now you talked about bringing mainframe data into like security applications. Can you talk about where you might be capturing in near real time information on transactions on the mainframe and how you would drive a decision in near real time wherever it may be on the mainframe out in the data hub but where you need to dig sort of into that core application and make a decision really, really fast. We absolutely, I will actually give you more than one use cases. One of the use cases is we see actually Kafka for example evolving as a data bus. So as the data becomes a service layer in the organizations and that you have operational logs, batch reference data, transactional data and streaming data from new data sources. Really a concept of a data bus and data as a service is very well received and it's simplifying the complexities around all these different diverse data sources. So we see Kafka evolving and we see that evolving in the financial services. We see that evolving in the gaming companies and Talco. So if you have this message broker and messaging framework where you are using as a data bus, it's really a matter of pushing the relevant data to data bus and making it available for all of the consumers. So it doesn't matter physically where it is. You don't have to make the decision on the mainframe as long as you've got this high performance feed that you can make the decision anywhere. Exactly, so if you have operational intelligence data telemetry and security data that's coming from mainframes or any other data storage in your organization. So that becomes available in a data bus and you can basically make that available to the applications. And our product, the MXH, integrate with Kafka. It also can accommodate other protocols that are available between the platforms. However, we see really Kafka and Kafka streaming becoming more available. So I'm a customer. I say, okay, Tendu, I like what you're saying. You understand my problem, whether it's churn or whatever it is real time. And you sound like you understand the whole my challenge of trying to simplify and bring these pieces together. How do I engage with sync sort? What can you do for me? What can I buy from you? What you can buy from us? It's basically a single software environment that enables all of these data accessible and it helps you access enterprise data and transform it. And when we access and transform, our unique value comes because we are running natively with the compute platforms while insulating the organization from the challenges. So any application you create on a standalone Linux server or can run on Hadoop MapReduce or with Apache Spark without any changes, without any completion. So skill sets and skill set gap and simplicity are really areas that we are very focused. So that's what we provide. We understand all of your enterprise data. You can create this data pipeline using our graphical user interface. And that data pipeline can run with multiple compute frameworks while doing all the optimizations for these job flows, taking advantage of Apache Spark optimizations, taking advantage of anything that happens on the MapReduce with data governance, security, encryption. So it's really a software that helps you access and transform your data. And what's the big focus for you guys in the next 12 to 18 months? What should we be watching? Our focus will be having this single software environment for streaming data as well as batch. And we will broaden our value offering in two ways. One, really broadening to streaming and two, broadening in terms of the data types. So bringing more telemetry and security data from mainframes and making it available for big data analytics and advanced analytics on Spark and Hadoop and whatever comes next. I love talking to you guys. I mean, it's a many decade history and you're like the oldest startup in the IT business. So I really appreciate you coming back and sharing your excellent insights, what customers are doing, how you guys fit. And I'll give you the last word, Spark Summit East Small. It reminds me of the early days of Hadoop World, right? What do you expect for Spark? Okay, once, thank you for having me on Cube. It's always a pleasure to talk to you. Funny you said startup. Actually, since October, we are now in New York startup program and we'll be moving to New York this year. Yes, so we are a 40 some year old startup. And Spark Summit, this is really exciting for me. It reminded me the early days of Hadoop World. It's small, however, it's very kind of focused and we see a lot of technology optimization sessions and that's how it happens usually because the platform becomes optimized before you start seeing people talking about the business applications and optimizing for the end user. So vendors like SyncSort, we have a big opportunity here because we have seen how it happened with the Hadoop and Spark, whether it's running on Hadoop, Yarn or on Mesos or ZLinux moving forward, we have this opportunity to make it available for the business users and simplify it. So I'm excited to see the rest of the sessions and looking forward to see what's coming up with real-time Spark. All right, well congratulations on the next chapter and we'll be watching, so thank you. All right, keep it right there, everybody. We'll be back with our next guest. Right after this, we're live from Spark Summit East in Manhattan, right back.