 at Big Data SV 2014 is brought to you by headline sponsors, WAN Disco. We make Hadoop Invincible and Actian, accelerating Big Data 2.0. Okay, we're back here live in Big Data Silicon Valley. That's hashtag BigDataSV. Check it out on Twitter, go to crowdchat.net slash BigDataSV, join the conversation. This is theCUBE, our flagship program. We go out to the events, extract the sealant from the noise. I'm John Furrier, the founder of SiliconANGLE. Join my co-host, Dave Alon, the co-founder of Wikibon.org and we are here in San Francisco, I mean, in Silicon Valley, covering Big Data. The Stratoconference is going on right behind us. We're covering all the top news coming out of that event. All, everyone's here, entrepreneurs, CEOs, executives, VCs, we're gonna talk to them all and get the signal from the noise for you. This is theCUBE. Our next guest is Josh Rogers, President of SyncSorg. Welcome to theCUBE. Thanks so much for having me. So, I first gotta ask you, we were talking before you came on about the Big Data. I mean, it's a new market, so there's new stuff happening, but the market's maturing. I wanna get your take on what you see happening here on the ground. Share with the folks the vibe this year at Big Data SV and Stratoconference. What's happening on the ground here? Yeah, so, I mean, I think one of the first things we've seen that's pretty basic, but pretty important and pretty critical, is just there's actual budget allocated to Big Data projects this year, which is pretty fundamental, right? I think 2013 year, 2013 was a year of experimentation. A lot of POCs, a lot of understanding what functionally can the technology do. 2014, we're walking to an environment where customers have allocated budget to build production systems. And I think that's a fundamental step, obviously, but I think it's also gonna come with a set of requirements that they're gonna expect these technologies to start to deliver in production. And so, I think we're in for a very interesting year. Dave, I wanna get your take on it. What do you think is happening with Big Data right now? I mean, we heard the suits comment earlier by HP, more suits this year than T-shirts or hoodies. I don't know if you get what his quote was, but it was something in those lines. The budget's being allocated, Josh mentioning that, what's your take? Well, I think we're seeing a situation where a lot of people are realizing that they can do the same or much more for way less. Right. So you're talking about at least a 10x factor in some cases, so that's getting attention to the suits and the wallets. They're saying, wow, we can cut our budget by maybe two thirds, maybe even more, and we can drive business outcomes that are gonna drive revenue. I mean, I think it's so hard to move the needle and pick a segment, take marketing, for example. And Josh, I'd love to get your take on this. You know, email marketing. You talk to email marketers or marketers and they say, you can't move the needle anymore. And now we're talking about analytics, driving whole new realms of business outcomes. And a lot of that is getting the data in, cleaning the data up, getting data out of legacy systems. So that's kind of your specialty. Yeah, absolutely. So I think there's lots of folks that have ambitions around creating new insights that'll drive big value. But what we're finding is that customers need a place to start that can give them an immediate win. And we refer to that as offloads. So there's lots of legacy repositories that are doing important but not potentially efficient workloads. And they've been doing them for decades in many cases. And what they're finding is they can take those workloads and move them into things like Hadoop and do it for a fraction of the cost. So it may cost me $100,000 a terabyte to manage data in my enterprise data warehouse. I might be able to do that in Hadoop for $1,000. So if I start with that as my first project, there's a bunch of positive benefits that accrue. The first is I save a ton of money. So I can actually free up budget to either give back to the corporation or build out this new Hadoop cluster that's going to be a foundational to my big data efforts. The second is I actually learn how to build and manage this infrastructure. And third is I actually end up with data in my cluster. So I actually can now, I've got an environment that I built that I understand how to manage that I can actually start to use for some of these more ambitious goals. But I'm starting with something that provides me an immediate ROI. It doesn't actually have the risk associated to the failure that some of the insights that people hope to get out of data, they actually can drive real savings out of the gate. I wonder if you could talk about Sincsort's strategy a little bit. I mean, Sincsort was one of the first companies I remember when I got into the business in the early 1980s, right? You guys were the standard on sorting the mainframe data. What led you to big data and what's the strategy behind that? Yeah, so it's a great question. I mean, so we actually have been around for I think it's 46 years, right? We were founded in 1968. Our first product was a mainframe sort utility and the architectural insight we had at the time was that sorting represented about half of the workload on mainframes and it wasn't particularly performant. So if you could make that sort operation go faster and at the same time use less CPUs, then you had not only a better business advantage and better performance in the applications but you actually save money. If you look at Hadoop, when we started to think about Hadoop and what the architecture of Hadoop was and how it was being used, we saw a couple of things. The first was that the map-produced framework is very sort-intensive. So if you have the best sort technology in the world, that could potentially be useful to accelerate the performance of the core framework. The second was we actually had a high-performance ETL capability built on top of that sort. So if you could actually replace that sort, you could actually plug not just our sort in but you could plug in an entire ETL capability. When we looked at how people were using Hadoop, the number one thing that they were doing was building large-scale ETL routines. And so we pursued the integration of our technology with the Hadoop framework via a contribution we made last year to the community and then we gade a Hadoop ETL tool in the middle of last year. What we've seen is great uptake on the solution and people are applying it to this offload scenario. They've got a data warehouse, 50% of that capacity is dedicated to data integration workload, ELT workload, that has been there for perhaps a decade because that was where they could get scalability. Now they've got a place they can get scalability that's a fraction of the cost and we're in enabling technology to allow them to move that processing and that data into Hadoop environment, make it a productive workload. The second thing we're seeing is that people want to be able to do that not just on-premise, they want to do it in the cloud. And what's interesting about that now you're kind of a little bit back to the future for us. Now you're in an environment where not only do you care about performance and scale but you also care about CPU time. The lower CPU time I can have in my cloud environment the lower I pay, lower my monthly bill. So we launched Ironcluster, our Hadoop ETL offering on EMR in November and we're seeing tremendous uptake on that and people are looking at how do I suck kind of these legacy workloads not just into Hadoop on-premise but potentially up into the cloud. So what are you seeing for EMR generally is elastic map reduce uptake from the Amazon community? Yeah, so I mean it's been amazing for us. We've gotten a wide variety of firms that are using the technology. I'd say there it's a little bit more learning what it can do and how it can fit in their environment. Largely our customer base is Enterprise and so I think enterprises are still trying to figure out where can they best apply. But there's no question that the economics of cloud are just undeniable in the flexibility and the ability to kind of not only deploy a Hadoop cluster in a couple of clicks but add on a full feature integrated ETL capability on top of that is pretty powerful. Josh, we always joke, Jave and I joke Amazon is a software mainframe. If you think about it, when you look at the as old computer science operating systems guys, the mainframe back in the day was great. I mean it was the glass house, everything was happening. Data processing, MIS. Remember those days, Dave? MIS department. If you look at what's going on. The API wasn't as open. We're seeing the unbundling of the software eating the world version from Mark Andreessen's vision happening now in front of us. And that is mainframe life. So it's just decentralized, right? So you guys have a unique angle. I just comment on the crowd chat about that. So I want to get your perspective. If you believe that the unbundling of the mainframe concept is moving into a distributed computing world, a la network is the computer. Had tips, Scott McNeely on that one. You guys are in a good position. Can you talk about one? If you believe that, then what's the next? If you believe that's the mainframe is now the cloud, mobile, big data. All the subsystems, middleware, et cetera, all kind of looking different. What's next? I mean, is it the app proliferation? What doesn't get commoditized in this new software eating the world? Yeah, I think it's going to move up the stack in terms of people searching for value. So I think what you'll start to see is much more industry specific applications on top of this compute platform. And people being able to drive applications that before were not possible because of scale or cost or the data that was required to be jammed into a relational model. So I think you'll start to see more and more industry specific applications built on top of these platforms. And those will be offered as SAS models or they might be offered on premise, but that's where I think we'll go. You'll start to see use cases that happen over and over again and that people will start to refine and package those up and sell them. And Dave just made a comment on Twitter. I saw it coming across the description. I invoked the software mainframe, which always depends on the grenade depending on which group you throw it at. But no, any more seriously, I got to ask you, what do you see on the commoditization because one thing we see in the cloud business is certainly the commoditization infrastructure level. Platform as a service is very hot and all the big data stuff's happening up on the stack. Is there a part of the stack that doesn't get commoditized? Do you think the application piece will be commoditized? And if so, or if not, talk about that. Yeah, I don't think we'll see all applications be commoditized, but I think you'll find that people need to find a very specific niche where they offer very unique functionality and they have a deeper understanding of user requirements and they're doing things that are hard and that will be consistently hard. I think an area that we believe we have special expertise in is when people move this Hadoop infrastructure into their enterprise from just start to think about things like offload, one of the things they have to do is figure out how they plug that into the mainframe. How am I gonna move all these mainframe data sets into my Hadoop cluster? How am I gonna move workloads that were written in COBOL and JCL into Hadoop and rewrite them in frameworks like MapReduce? That's hard. They're subsidic to ask each one of us to write JCL and there's Hadoop-like COBOL copybooks and deal with the curves depending on. We believe that's an area that we can continue to strengthen. We would broadly refer to that as big iron to big data. So that would be an area that we believe we can contain or sustain a competitive advantage because of our expertise. And I think there's lots of areas that won't be commoditized, but I think it's gonna be required, vendors are gonna be required to develop a very deep level of expertise in a specific area that they can monetize and they can maintain different duration. How do you guys compare and how do you guys talk about the IBM presence? Obviously they have that mainframe, big iron, a big data focus. Obviously they have mainframe, big player there. Still a big market. People have all the stuff in their legacy. I mean, how do you guys play with IBM? Yeah, so IBM's a partner in a couple of different ways. I think that IBM has a collection of impressive capabilities. I think they're starting to take some moves to integrate those, but they also have some very significant businesses that drive a lot of revenue for them that they may not wanna see commoditized. So there's a classic innovators dilemma for pieces of that organization. I think what we are focused on is really how do we help customers come in, leverage the Sadoop infrastructure to drive offload scenarios and be able to take workloads that they were performing in a non-performant manner in expensive legacy stores into Hadoop and get incredible value from a savings perspective, because we know those savings are gonna be actually deployed back into the infrastructure to build new higher value applications and we can be a part of that. I think IBM's got the same opportunity. I gotta ask you a question when you asked earlier because this is relevant to you. Data Fusion is a topic that we talked about with Acti and CTO and it's early on for his labs. But when you think about Data Fusion, it's a collectively mashing mashup of data at a variety of different life cycles, either massive ingest or pipelining it into advanced analytics, those kinds of environments. So what's your take on Data Fusion? Where is it at? Is it still kind of a concept of people getting their arms around? Are you seeing it in practical use cases? So I think that that is effect what people are doing in some of these big data environments. I think they're using a collection of tools today, a lot of which are custom written. We're seeing a wide variety of file formats that people are starting to use. They haven't used in the past and it's helpful because it allows them to store this hierarchical or semi-structured data. You know, what I think is interesting is that at the end of the day, people wanna submit them to analytics and they wanna be able to have humans look at trends and be able to predict what's gonna happen. To do that, they actually have to take what is some broad collection of data that tends to be structured or semi-structured, perhaps completely unstructured, but they have to derive some level of structure out of it. And so that's what becomes kind of complicated. I think there's a lot of opportunity to create tooling that allows people to do that more easily and I think there's been a lot of innovation in that space. I also think there's a lot of compute that has to be applied to that because at the end of the day, I'm talking about large volumes of data that have to be transformed and related together. I think that's where there's a great opportunity for new tools to come out to help people be more productive in these big data environments and that's where we think we have an opportunity to try that out. Share with the folks out there about Sing Store a little bit more about, were you guys low years in East Coast? Based on the East Coast, we're based. Jersey boys. Just outside of New York City. Woodcliffe Lake, New Jersey. My stomping ground. All right, very good. I went to High School in Montvale, right next door, a couple of towns over. So, and... Frankie Valley Country. Yup. The boroughs of New Jersey. Governor's awesome. So what I wanted to do was come on and talk about New Jersey. Those tolls, far from Fort Lee. So, we're based in New Jersey, have been since the founding of the company. As I said, we started on the mainframe side, but we've made a significant number of investments in the Hadoop world and continue to make those investments. You know, we've brought this Hadoop ETL capability to market. This year took that the cloud in, I'm sorry, late last year, took that the cloud in November. You will continue to see additional offerings come out of SingSort this year. Some of the things that we're working on are really enablers around this offload scenario. So for example, if I've got a warehouse and half of it is dedicated to ELT processing, that's expressed in SQL, it would be nice to be able to reflect, you know, move that workload into Hadoop, but there may be an opportunity to actually reflect it as traditional ETL flows. So we actually built a SQL analyzer that will actually give you a sense of what that SQL is doing and visualize it for you so that you can understand not only how to take it and rewrite it in MapReduce paradigm, but also how to optimize it from a performance perspective. We're doing the same thing on the mainframe, so JCL and COBOL jobs that are doing batch processing. Heavy lifting, a lot of heavy lifting. So I got to ask you a question, could we talk to a lot of startups out here? How do you talk to your customers when you say, hey, let's go to Silicon Valley, all these new startups that got a B round, we just had a great startup on earlier, that had 15 million out, Series B. So, you know, they're small, but they're in beta. These big customers don't want to put their bets on the enterprise. It used to be like you never had a startup in your inventory, but now you're starting to see that. How do you deal with the startup ecosystem and this evolution that they gave you? Well, it's funny, you know, being from New Jersey and being around for a long time, I think a lot of the partner ecosystem doesn't necessarily know what to make of us, but I think what they found is that we can create a lot of value for them. So, you know, if you look at some of the Hadoop distributions, all of which are good partners of ours, as they started to move into the enterprise, they found it very important and helpful to have friends with people like us that know the mainframe, that know how to take that infrastructure, plug it in to these enterprises, and actually know some of the folks that run the mainframes in those enterprises, and we're a trusted vendor that can help them get access to that data, which is pretty important data. It tends to be about 70% of the corporate data that they have access to, and it tends to be some of the most important reference data in the form of transactions. So, I think what we found is that, you know, folks like Claderal Horton, MapR have been great partners. We have been able to bring value to them, help them access repositories that they wouldn't otherwise be able to access in a very easy manner. At the same time, they've been great for us. You know, they've been able to, they've been very open to allowing us to innovate with them, so you'll be seeing some more contributions to Apache Hadoop from SyncSort this year. We're working on some things with both Horton and Claderal right now, and you know, they've been terrific in embracing some of the expertise that we bring around performance and legacy processing paradigms to infusing the Apache Hadoop. So, they're evangelizing Hadoop. They're spending a lot of market effort, market development effort doing that, and then, so you're essentially evangelizing, helping to clean up the mess, the data messes that. I wouldn't say helping to clean up a mess. I don't want to state that they're creating any sort of mess, but I would say that. No, it's out there, I'm saying. Yeah, as people come into the enterprise, you know, they want to bring in things like mainframe data, and you know, people aren't necessarily wanting the Hadoop developers to open up. That's a messy task is what I meant by that. So, okay, so talk to a little bit more about that go-to-market. So, you're partners with these guys, and then, so how do you go-to-market? How do you sell? What kind of channels? Maybe to talk about that a little bit. Yeah, so we have a broad set of co-sell relationships with all the distributions. We have a number of conversations, and you'll see partnerships roll out later this year around reselling OEM relationships, and with some of the appliance vendors, with some of the hardware vendors, people that are selling things into Hadoop clusters where an ETL capability would be helpful. And then we have a broad set of SI relationships, both at the global level, so Cognizant is one of our premier global system integrator partners, and then we have a broad set of boutique big data partners. So the hardware partners, for example, so they're partnering with you, you're sort of a joint go-to-market, are they reselling your products? There's nothing I can announce today, but I can say that we're in pretty advanced conversations with a number of appliance and hardware vendors to resell our software on top of their hardware offerings to be able to deliver a pre-packaged Hadoop distribution with an ETL capability. And how does it work with AWS? Are you part of the AWS marketplace? With AWS, we're on the marketplace. We're the only ETL tool that's on the marketplace for EMR. So we are an AMI, you go on, you spin up your EMR image, and you can choose to add us to that, and it's a couple of clicks to do so. Right, right. Well, my final question for you is more on the distros you mentioned. Obviously, you guys are certainly attractive for a lot of the startups because you guys have big presences, big accounts, heavy lifting, the front-end loader probably for them to get at these accounts and plow through and get some penetration. But I got to ask you from someone who's out on the front lines, with all that focus, do you see a consolidation in the distros happening? Do you see, eventually, has that's been a big conversation we've been hearing all day today is that we hear that the consolidation needs to happen on the distros. Do you believe that or not? You know, I think, well, first of all, it depends on how many distros we're talking about. You know, if your list is 12 or 14, yes. I think you're going to see some consolidation over the course of the next 12 months. But I think there is room for several distros. And right now, I think it's been helpful to the community because I think it's driving a ton of innovation, not just in terms of, you know, the actual capabilities of the platform, but also in terms of where people are pointing and how they're using it. So I think that you will see a smaller set, but I think that if you look at what Cloudera, Horton, and MapR are doing, it's very beneficial to driving adoption and driving maturation of the market. Yeah, and we know we got to give, you know, Cloudera props, Amar Awadallah, Cube alumni, big supporter of what we do, said to us, straight up when Horton rose into the field, hey, you know, more people making software around Hadoop makes everyone better. I think that's a good point. Really appreciate it. SyncSort here on theCUBE. Josh, we really appreciate your time. Leader doing the heavy lifting. Big iron to big data. Love that's the tagline. I'll make that the bumper sticker for this segment. We'll be right back with more action here at the Hilton, right across the street in the Stratocommerce going on live here in Big Data SV, hashtag Big Data SV. Join the conversation. This is theCUBE. I'm John Furrier with Dave Vellante. We'll be right back.