 from San Jose in the heart of Silicon Valley. It's theCUBE, covering Big Data SV 2016. Now your host, John Furrier and George Gilbert. Okay, welcome back. When we are here live in Silicon Valley for theCUBE Silicon Angles Flagship Program, we go out to the events and extract the signal from the noise. I'm John Furrier, my co-host, George Gilbert, Big Data Analyst at wikibond.com. Our next guest is Tendu Yogurtutu. Yogurtutu, how do you say your last name? Yogurtutu. Okay, I got this close. GM with Big Data with SyncSore. Welcome back to theCUBE. SyncSore has been a long time guest. One of those companies we love to cover because your value proposition is right in the center of all the action around mainframes. And Dave and I always love to talk about because we're all from the old mainframe, not mainframe guys. We know, we remember those days and still powering a lot of the big enterprises. So I got to ask you, what's your take on the show here? One of the themes that came up last night on CrowdChat is why is enterprise data warehousing failing? So, you know, got some conversation, but you're seeing a transformation. What do you guys see? Thank you for having me. It's great to be here. Yes, we are seeing the transformation of the next generation data warehouse and the evolution of the data warehouse architecture. And as part of that, mainframes are a big part of this data warehouse architecture because still 70% of data is on the mainframes, world's data, 70% of world's data. This is a large amount of data. So when we talk about big data architecture and making big data and enterprise data useful for the business and having advanced analytics, not just gaining operational efficiencies with the new architecture and also having new products, new services available to the customers of those organizations, this data is untapped and making that part of this next generation data warehouse architecture is a big part of the initiatives and we play a very strong role in this bridging the gap between mainframes and the big data platforms because we have product offerings spanning across platforms and we are very focused on accessing and integrating data, accessing and integrating in a secure way from mainframes to the big data platforms. One of the things, obviously the mainframe highlights kind of a dynamic in the marketplace among all customers, whether they have mainframes or not, but your customers who have mainframes, they already have a ton of data. They're data full, as we say in theCUBE. They have a ton of data to do it, but they spend a lot of time, as you mentioned, cleaning the data. How do you guys specifically solve that? Because that's a big hurdle that they wanna just put behind. They wanna clean fast and get onto other things. Yes, we see a few different trends and challenges. First of all, from the big data initiatives, everybody is really trying to either gain operational efficiency, business agility and make use of some of the data they weren't able to make use of before and enrich this data with some of the new data sources they might be actually adding to the data pipeline, or they are trying to provide new products and services to their customers. So when we talk about the mainframe data, it's really how you access this mainframe data in a secure way. And how you make that data preparation very easy for the data scientists. The data scientists are still spending close to 80% of their time in data preparation. And if you come to think of it, when we talk about the compute frameworks like Spark, MapReduce, Flink, versus the technology stack technologies, these should not be relevant to the data scientists. They should be just worried about how do I create my data pipeline? What are the new insights that I'm trying to get from this data? The simplification we bring in that data cleansing and data preparation is, one, we are bringing simple way to access and integrate all of the enterprise data, not just the legacy mainframe and the relational data sources and also the emerging data sources with streaming data sources, the messaging frameworks, new data sources. We also make this in a cross-platform secure way. And some of the new features, for example, we announced were we were simply the best in terms of accessing all of the mainframe data and having this available on Hadoop and Spark. We now also make Spark and Hadoop understand this data in its original format. You do not have to change the original record format, which is very important for highly regulated industries like financial services, banking, and insurance and healthcare because you wanna be able to do the data sanitization and data cleansing and yet bring that mainframe data in its original format for audit and compliance reasons. Okay, so this is the product, I think where you were telling us earlier that you can move the processing, you can move the data from the mainframe, do processing at scale and at cost that's not possible or even is easy on the mainframe, do it on a distributed platform like Hadoop. It preserves its original sort of way of being encoded, send it back. But then there's also this new way of creating a data fabric that we were talking about earlier where it used to be sort of point to point from the transactional systems to the data warehouse. And now we've basically got this richer fabric and your tools sitting on some technologies perhaps like Spark and Kafka, tell us what that world looks like and how it was different from... We see a greater interest in terms of the concept of a data bus because some organizations call it data as a service, some organizations call it Hadoop as a service, but ultimately really an easy way of publishing data and making data available for both the internal clients of the organizations and external clients of the organizations. So Kafka is in the center of this and we see a lot of other partners of us including Hadoop vendors like Cloud, RMR, Power and Hortonworks as well as Databricks and Confluent are really focused on creating that data bus and servicing. So we play very strong there because phase one project for these organizations, how do I create this enterprise data lake or enterprise data hub? That is usually the phase one project because for advanced analytics or predictive analytics or when you make a change in your mortgage application you wanna be able to see that change on your mobile phone under five minutes. Likewise, when you make a change in your healthcare coverage or telecom services, you wanna be able to see that under five minutes on your phone. These things really require easy access to that enterprise data hub. What we have, we have a tool called Data Funnel. This basically simplifies in a one click and reduces the time for creating the enterprise data hub significantly and our customers are using this to migrate and make, I would not say migrate, access data from the database tables like DB2, for example. Thousands of tables populating and automatically mapping metadata whether that metadata is Hive tables or Parquet files or whatever the format is going to be in the distributed platform. So this really simplifies the time to create the enterprise data hub. It sounds actually really interesting when I'm hearing what you're saying. The first sort of step was create this data lake. Let's put data in there and start getting our feet wet and learning new analysis patterns. But if I'm hearing you correctly, you're saying now radiating out of that is a new sort of data backbone that's much lower latency that gets data out of the analytic systems, perhaps back into the operational systems or into new systems at a speed that we didn't do before so that we can now make decisions or do analysis and make decisions very quickly. Yes, that's true. Basically operational intelligence and batch analytics are converging. Okay. And in that convergence, what we are basically seeing is that I'm analyzing security data. I'm analyzing telemetry data that's streamed and I wanna be able to react as fast as possible. And some of the interest in the emerging computer platforms is really driven by this, the use case, right? Many of our customers are basically saying that today operating under five minutes is enough for me. However, I wanna be prepared. I wanna future proof my applications because in a year it might be that I have to respond under a minute, even in sub seconds. When they talk about being future proofed, and you mentioned to time sort of brackets on either end, our customers saying they're looking at a speed that current technologies don't support. In other words, are they evaluating some things that are essentially research projects right now, very experimental? Or do they see a set of technologies that they can pick and choose from to serve those different latency needs? We published the Hadoop survey earlier this year in January. According to the results from that Hadoop survey, 70% of the respondents were actually evaluating Spark. And this is very consistent with our customer base as well. And the promise of Spark is driven by multiple use cases and multiple workloads, including predictive analytics and streaming analytics and batch analytics, all of these use cases being able to run on the same platform. And all of the Hadoop vendors are also supporting this. So we see as our customer base are heavy enterprise customers, they are in production already in Hadoop. So running Spark on top of their Hadoop cluster is one way they are looking for future-proofing their applications. And this is where we also bring value because we really abstract that and isolate the user while we are liberating all of the data from the enterprise, whether it's on the relational legacy data warehouse or it's on the mainframe side or it's coming from new web clients, we are also helping them insulate their applications because they don't really need to worry about what's the next compute framework that's going to be the fastest, the most reliable and low latency. They need to focus on the application layer. They need to focus on creating that data pipeline. Ted, I want to ask you about the state of sync sort. You guys have been great success with the mainframe. This concept of data funneling where you can bring stuff in very fast, new management, new ownership. What's the update on the market dynamics? Because now ingestions, everything, multiple data sources, how do you guys view what's the plan for sync sort going forward? Chair or the folks out there? Sure, our new investors clearly capital is very supportive of both organic and inorganic growth. So acquisitions are one of the areas for us. We plan to actually make one or two acquisitions this year. And companies with the products in the near adjacent markets are real valued for us. So that's one area in addition to organic growth. In terms of the organic growth, our investments are really, we have been very successful with a lot of organizations, insurance, financial services, banking and healthcare. Many of the verticals, very successful with helping our customers create the enterprise data hub, integrate, access all of the data integrated. And now carrying them to the next generation frameworks, those are the areas that we have been partnering with them. The next is for us, is really having streaming data sources as well as batch data sources through the single data pipeline. And this includes bringing telemetry data and security data to the advanced analytics as well. Okay, so it sounds like you're providing a platform that can handle today's needs, which were mostly batch, but the emerging ones which are streaming. And so you've got that sort of future-proofing that customers are looking for. Once they've got those types of data coming together, including stuff from the mainframe that they might want to enrich from public sources, what new things do you see them doing? Predictive analytics and machine learning is a big part of this, because ultimately there are different phases, right? Operational efficiency phase was the low-hanging fruit for many organizations. I want to understand what I can do faster and serve my clients faster and create that operational efficiency in a cost-effective, scalable way. Second was, what are new for go-to-market opportunities with transformative applications? What can I do by recognizing how my telco customers are interacting with the self-service help and how under a couple of minutes I react to their responses or self-service is the second one. And then the next phase is that, how do I use this historical data in addition to the streaming of data rapidly I'm collecting to actually predict and prevent some of the things? And this is already happening with banking, for example. It's really, with the fraud detection, a lot of predictive analysis happens. Advanced analytics using AI and advanced analytics using machine learning will be a very critical component of this moving forward. This is really interesting because now you're honing in on a specific industry use case and something that every vendor is trying to sort of solve the fraud detection, fraud prevention. How repeatable is it across your customers? Is this something they have to build from scratch because there's no templates that get them 50% of the way there, 70% of the way there? Actually, there's an opportunity here because if you look at the healthcare or telco or financial services or insurance verticals, there are repeating patterns. And that's great for fraud or some of the new use cases in terms of customer churn analytics or customer statistics. So these patterns and compliance requirements in these verticals creates an opportunity actually to come up with application applications for new companies, for new startups. Okay, Tendu, final question. Share with the folks out there the view of the show right now. This is 10 years of Hadoop, seven years of this event. Big Data NYC, we had a great event there. New York City, Silicon Valley. What's the vibe here in Silicon Valley here? This is one of the best events. I really enjoy Strata San Jose and I'm looking forward to the two days of keynotes and hearing from colleagues and networking with colleagues. This is really the heartbeat happens because with Hadoop World and Strata combined actually we started seeing more business use cases and more discussions around how to enable the business users which means the technology stack is maturing and the focus is really on the business and creating more insights and value for the businesses. Tendu, Yogochu, welcome to theCUBE. Thanks for coming by, really appreciate it. Go check out our Dublin event on the 14th of April. Hadoop Summit will be in Europe for that event. And of course go to Siliconangle.tv, check out our Women in Tech. Every week we feature Women in Tech on Wednesday. Thanks for joining us. Thanks for sharing the insight with SyncSore, really appreciate it. Thanks for coming by, this is theCUBE. We'll be right back with more coverage live and Silicon Valley after this short break.