 and those production live of re-invent 2013. This is our first year at re-invent, even though we did the AWS Summit in June, it's a little warm-up for the Cloud Super Bowl, which is going on here, 8,000 people, descending upon Vegas. Lonnie Jaffe is here, he's the CEO of SyncSort. Lonnie, thanks for coming on theCUBE. Thanks for having me, it's great to be here. Yeah, so what do you think of the conference? Is this your first re-invent? Were you here last year? It's my first re-invent. Yeah, us too, as I just said, but so what do you think of the vibe? Fantastic vibe, the energy level is really quite amazing. I've met 9,000 people who are here, and then also the Twitter sphere and the ecosystem of people who are reacting to what's going on in the conference is really quite remarkable. So SyncSort's a really interesting company. One of the first companies that I met coming into the business back in the 1980s, and you've sort of reinvented yourselves, I guess pun intended, made some recent changes in terms of the structure, so I wonder if you could give us the update on SyncSort. For those of you who might remember SyncSort from the mainframe days, it's quite a different company now, but take us through where you're at and maybe some of the recent changes that people might not be familiar with. Yeah, so the company, one of the first software companies ever, founded in 1968, and really high-performance data processing software on the mainframe sorting programs are actually very challenging. It's one of the core computer science challenges that anybody who's studied computer science had to deal with. And so for the first couple of decades, that was the business, and then over time, the company evolved to build a whole array of next-generation technologies that do connect into legacy systems, process data, and then load it into new systems. And so I joined a couple of months ago. I came, I was at IBM for 13 years, and then I was at CIA technologies for a couple of years. And at SyncSort, the first thing that we wanted to do was take this Hadoop business that we had built, this next-generation big data technology that was selling really well, and scale it up, just double down on the business and grow it and do all sorts of interesting things like create the cloud version, which we just launched this week. And then the other thing we wanted to do was to the owners of the company, which are some of the best private equity venture capital funds on the East Coast of the United States, Inside Venture Partners, Bessemer, Georgian Partners, Goldman Sachs, they wanted to double down on the business inorganically. They wanted to allocate their capital, so Inside alone just raised a $2.6 billion tech fund. They were in Twitter and Tumblr and a lot of the other high-profile recent exits, and they wanted to use SyncSort as an acquisition vehicle that we can go up and buy other strategic near-adjacent software companies that we could give a lot of lift to. So just did the first one of those a couple of weeks ago, a company called Circle based in the UK. So what's Circle all about? Great question. So the technology of Circle takes mainframe data that's stored in IMS or VSAM, and it moves it to a more strategic location, DB2Z, and makes it a lot more accessible to our Hadoop product. So now we could, a lot of the early adopters of Hadoop, the consumer internet companies of Silicon Valley, Facebook and LinkedIn and Twitter and Google, they didn't have mainframes, they didn't have any legacy systems. So this wasn't a challenge for them, but for the grown-up companies who are using Hadoop, what good is your next generation big data cluster if it can't access your actual data? And 70 to 80% of corporate data is still stored in the mainframe, so this allows them to get access to that mainframe data, and it also helps them save a lot of money. So you said VSAM and IMS data, to DB2Z. So then it can be accessed off-platform. So not off-Z DB2, I forget what IBM calls it, Linux, Unix and whatever else, but the Z version of DB2. Yeah, and so it's a relational structure instead of hierarchical, so it makes it a lot easier to access using our Hadoop product. So then what we did this week is we launched what's called Syncsort Iron Cluster. It's the first data integration product that runs natively on Amazon Elastic MapReduce, and so what that allows you to do is without installing any servers or software, you just one click, you can spin up a Hadoop cluster in the cloud and start siphoning off your expensive workloads that you were previously running in a legacy data warehouse or legacy ETL tool. You can instead save huge amounts of money, redo those processes in Hadoop in the cloud without building anything, and then as a side effect of saving all of that money, you also get a Hadoop cluster, which is getting better and better every day. You can use the Hadoop cluster to pre-process the data and then load it back into the whichever systems you were loading it into before, or you can load it into something like Amazon Redshift, which gives you the opportunity to do a whole array of next generation cloud things. What are you seeing for EMR Elastic MapReduce and what are the advantages relative to sort of non-cloud MapReduce? Well, one advantage is you don't have to buy any machines, you don't have to install any software, you don't have to plug anything in. That's something you'd expect, right? It's very elastic, so you can scale up and scale down. Batch processing tends to be very interrupt-based, so there are times when you need to do more batch processing and less, so it allows you to be very flexible there. It's also getting cheaper and cheaper all the time, and this is something that Amazon is great at. Not only do they make incredibly high-performance systems, but they're really good at making sure that the price is the best for the customer. So for a vendor like us, we're able to create a product, have it be really inexpensive, free up to 10 nodes, and then incredibly cheap hourly price beyond that, and then as elastic MapReduce gets cheaper, we benefit from the commoditization that's going on in the cloud platform space because our product becomes higher and higher value. So, Lonnie, talk about some of the, let's dig into some of the use cases. So you're moving data from mainframes into Hadoop, so you're doing some processing. What kind of, when you're talking to customers, what are some of the real pain points that they're looking to address? I mean, essentially, is it customer data they're trying to get in there? What are the workloads that these data sources are going to really support? So the most common scenario that we're seeing in terms of the really large purchases of our Hadoop products so far, our customer has, let's say, six different source systems. One of them might be the mainframe, and accessing the mainframe is very challenging. There's all sorts of things you need to do, like EBSDIC to ASCII conversions and Cobalt copy book imports, like really hard, sustainably high-value stuff. But then- Fun though, really fun. Really fun. Then you have maybe five other source systems, and what you're doing today is you're taking those and you're bringing them into a legacy data warehouse, more expensive data warehouse environment, and you're doing what's called ELT. So you're processing it in the data warehouse, and then maybe you're running a business intelligence tool against it, like Cognos or Business Objects or something like that. Instead, the very simple version is they have an opportunity to upgrade their expensive data warehouse for, let's say, $5 million. Instead of upgrading the data warehouse, what they do is they spend $1 million building a Hadoop cluster, and then they avoid the data warehouse upgrade entirely. They save $4 million. They put the Hadoop cluster between the six source systems and the downstream data warehouse, and all they use it to do is to pre-process the data and then load it into the data warehouse that they were gonna put it in anyway. Very simple use case. You just use it as a pre-processor to save money, and then as a side effect, you get the Hadoop cluster, which becomes a long-term active archive of all of your data, incredibly inexpensive. 100 to 1,000 times is cheap, which is so many orders of magnitude, it's really hard to conceptualize. And then over time, the Hadoop cluster gets better and better. So the phase two is, okay, now that I'm using Hadoop as a pre-processor before loading my downstream systems, maybe I can stop loading those downstream systems altogether, because the world of things you can do with the data once it's already in Hadoop is getting better every day, and not just what you can do with it against Hadoop, but also there's these nearby systems, like the NoSQL repositories, so you can pre-process the data in Hadoop and then load it into one of those and then do all sorts of sophisticated analytics, next generation systems, so that cloud product lets you do all of those things, use Hadoop as a pre-processor, save huge amounts of money, build skills, make it easier to attract talent, but you don't even have to build the Hadoop cluster on premise. Right, absolutely, and I think you touched on something really important, doing the transformation, the processing in Hadoop instead of in your data warehouse, it's going to save you a lot of money. It's, you're using the power, the compute power of Hadoop, you're using it more than just as a storage engine, you're using it as the compute capabilities, which is really where the power is, and you're not paying Teradata and Oracle significant amounts of money to upgrade as data volume continues to grow, and then you use those savings, as you said, now you've got a Hadoop cluster, you can pump those savings into either hiring data scientists or experimenting with new analytics, so it kind of pays for itself, it's a great value proposition it sounds like for the enterprise. Yeah, and so what our product, the way it's useful, and this is, Hadoop distributions really are attracted to us because we, first of all, make Hadoop a lot easier to use. We contributed a patch to the Apache Hadoop trunk, MapReduce 2454, which got committed earlier this year, and it allowed us to install our algorithm and sort program on every node in the cluster, use our GUI to design the MapReduce jobs instead of writing it in Pyagre Hive, and it just makes the Hadoop cluster run so much faster, it handles all the connectivity into the backend systems, including the mainframe, which is really challenging, and it has this quality of making it easier to manage and maintain over time. So that, in the cloud in particular, is very useful. It's like this, think of a big red button in the cloud that you can push and suck up the expensive workloads from your legacy systems. Right, so you're abstracting with that complexity, and I think the GUI interface, you can kind of intuitively create jobs without having to code and do all that kind of work, and it's amazing in the data integration world, I mean, there's still so many people out there whether it's not necessarily big data workloads, but including big data workloads, they're doing hand coding, and just this primitive form of moving data around, have you seen that change all over the years? I mean, I was covering this market 10 years ago, and that was a problem then, and it seems to me it's still the way a lot of companies do it. The early adopters of Hadoop, you know, the consumer internet companies, they could hire an army of Stanford Computer Science graduate students to cobble together data flow language code on Hadoop. It's really challenging for the grown-up companies, the large banks, the hospitals, the government organizations to do that. We are seeing people who are using things like Hive, which is still very popular. Even if you're using Hive, you can take our technology, plug it in, it just makes the cluster run so much faster and it still handles connectivity into all sorts of legacy systems. And then over time, you can start to use the visual tool that we have so that you can manage the flows that you're building a lot easier. So it sounds like your mission is to make Hadoop better, easier, faster, increase adoption. Exactly, and we also focus a lot on the downstream systems. So we want to become the most differentiated pre-processor for the fastest growing things that you would do with Hadoop systems. Like so today we load into Vertica and Tableau and all these incredibly sophisticated no-sequel repositories. And then over time when people start doing all the big data things that people have been envisioning for a while, dynamically altering the price of milk so you can sell more bread because you know the foot traffic patterns in your store or predicting traffic patterns a couple of hours ahead of time and dynamically allocating the tolls to anticipate traffic bottlenecks or figuring out which healthcare treatments result in the best outcomes and delivering that insight to the clinicians at the point of care, you already have a Hadoop cluster. It already has all of your data. It already has a fantastic tool on it that's easy to design the analytics and you can let your data scientists loose on that and create all sorts of next generation systems. And essentially talking about integrating an analytics system with what essentially is a transaction system and making decisions in near real time. That's right. I mean there's also an emotional component to it. People could always do data offload or workload siphoning off but you do a whole bunch of work and then all you have is a new system that does the same thing as your old system. Not that exciting. However, with Hadoop, not only do you save five to 10 times as much money as building out the Hadoop cluster costs, you get this cluster that is getting better faster than almost anything else in the technology industry and that emotionally just feels more valuable to people and it gets prioritized above the line. It's so much easier to sell something when it does something useful and saves money as opposed to if it just saves money or it just does something useful. So if I understand the strategy, you're taking the data transformation prowess of SyncSort and you're building around that using M&A. You obviously have a strong background in M&A. What have you, what you've learned at IBM that you're going to take with you? IBM is a very prescriptive M&A model, right? What will you take with you and what will you do differently? So my big focus from an acquisition perspective is identifying, and this is particularly important in the world of open source, is finding things that are sustainably high value. It's very easy to have a piece of technology in the world of Hadoop where tomorrow a company like Facebook open sources a project that does the same thing as the product that you're trying to sell for money and that's problematic. So you need to find things that are incredibly challenging, sustainably high value and that we can give a lot of lift to. We have thousands of customers. We have a global enterprise sales force. So we have the ability to buy a powerful piece of technology and then pump it through our global sales force and give it a lot of lift. The other thing we want to find is things that are going to not restrict our partnership freedom of movement. We have amazing partnerships with the Hadoop distributions with companies like Amazon and Tableau and Vertica. So we want to make sure the things we buy continue to make us more and more useful to them. Awesome. Lonnie, we get in the hook, but great segment and congratulations on the move and the opportunity, the vision, wish you the best. Thank you. It's really good. Okay, keep it right there. We're right back with our next guest. We're live. This is theCUBE from Reinvent 2013.