 Winston Edmondsson here at Hadoop Summit. I'm here with big data services company, Cubol. Got Hadis Tosu, is going to tell me a little bit about the difference between a services company and some of the other companies, the competitors that really just offer a singular solution. Thanks for being with us today. Hey, thank you very much. Thanks for having me here. Tell me a little bit about the advantage of going with a services company like yours. So the big advantage is that, you know, big data is a very complex field. There's a lot of moving parts and of components that have to be put together. So as a services company, by providing this as a service on the cloud, we take care of all that complexity. So really, the end users can really focus on their data sets and transformations, which is where most of their focus should be. That brings the maximum value to the business. While we take care of all the things like what type of cluster should I run or what kind of big data services should I run, what kind of workflow engines should I run, all that we can provide as a turnkey solution on the cloud, and you know, abstract that level of infrastructure complexity away from most customers, whereas our customers can just focus on their data sets and their transformations. So when I talk to potential businesses that would use a service like this, one of the hurdles that prevents them from adoption is the fact that it requires so much manpower and so much time. Are you saying with a service like yours, it simplifies to the point where they can just turn it on and go with it? Exactly, that is exactly what we are trying to do, and this is exactly what our service provides. It's a turnkey solution. You log in into the service and you have right there the best of great technologies that are useful in big data in the form of an integrated platform. So these are not like component pieces here. These are well integrated into a full blown platform wherein you can come in, create your models on your data sets, discover patterns on your data sets and from those patterns create periodic jobs or batch processing pipelines. Very easily in a single integrated platform, at the level of a data architect, at the level of a data analyst, and not really get into, hey I need to hire like five Hadoop experts or five big data infrastructure experts to put this stack together. You can just use us. It's a turnkey platform. You create a login and start using it. Talk to me a little bit about the folks that have multiple data sets and feel as if there's really going to be no way to combine all that and get any kind of useful data. What can you do for those types of businesses? So we provide in our platform a lot of connectors to various sources which they can use to bring those data sets in a single consolidated place and that is the harder part of the problem. Once you do that, after that it becomes much easier. The platform, we are of course leveraging Hadoop and Hive. Hive is a technology that myself and my co-founder had created back in Facebook back in 2007 but we are leveraging those technologies together along with the connector strategy which helps bring data in from various different sources and combine them together. So we get data out from sources. Any RDBMS we can get data out from. Any NoSQL databases such as Mongo we can get it out from. We are working on connectors to REST based services which can get it out from services such as Google Analytics or Omniture and things like that or for Adtech World, App Nexus data and things like that. So by those connectors people can easily get data out from those sources. Do that periodically, do it in incremental fashion and then with the processing power of the engine itself they can start transforming that data. So there's hope no matter how complex and how many different sources you guys can sort that out and get it into a simplified kind of a one dataset solution. Absolutely, a consolidated dataset solution. There's going to be multiple datasets but a place where all the datasets are together you can explore them, you can correlate them very easily in a single place as opposed to figuring out hey, you know where should I get my datasets on? Is the data arriving on time or not? How do I pull the data from this particular system? How do I write a connector for this other system? All that is consolidated on the platform. Add that to the power of Hadoop and Hive. You get a killer application there. Let's talk to customers that feel like they have a need for speed. They want to increase the speed because it's a complaint that some of these processes are very time consuming. I understand you guys have made some really impressive advancements in terms of speed. Yeah, so we have invested in making these stacks run really well on the cloud and we have invested in making them run fast on the cloud. So we have put in things like caches to increase the speed and so on and so forth. To the degree that when we compare this with some other Hadoop on the cloud services we are on TPCS we are like four to eight times faster on many queries. Now does that get you into real time? Probably not. Is it much better than what raw Hadoop supports? Absolutely yes. And I think in the community itself we are seeing a lot of development in terms of speed. There are a lot of projects which are coming out. There is work being done on Hive itself to make it faster which we are going to also bring in into our service to further increase it. But yes, if you just compare raw Hadoop, just running Hadoop version on the cloud, we are on AWS cloud, EC2, and you compare it with us, it's much faster. Very nice. Let's talk to some of the CIOs that are out there. They don't really care that it's cutting edge, that it's trendy, that everyone's talking about it. They want to know about the bottom line. Is this going to make or save me money? Let's talk about those dollars. What have you seen in terms of savings that a solution like yours can do? So there are a bunch of things that, solutions like us can really help primarily because I'll give you very simple examples of how cost savings, there are some tangible actions that cause savings. So on the cloud, you're running servers and you get built by servers. In a solution like us, we abstract that away from being a manual process, like how many servers should I run, how big of a cluster and things like that. The software does it adaptively to demand. What that means is that you're running optimal size clusters. So you're getting, you're paying optimal amount of costs for running those clusters. You're not running a 100 node cluster whereas you just need a 50 node because your demand, the cluster demand was just 50 nodes. We scale that. So that helps in keeping the cost bounded. At the same time, all the improvements that are being done also in terms of speed and stuff like that, they also help in bringing the cost bounded. But the bottom line, it's not just about the cost. It's about this technology enabling you to do things which are not possible before. If you're talking about log data, if you're talking about unstructured or semi-structured data, if you're talking about JSON logs, if you're talking about combining that with transactional data, you have to start looking at these technologies because the price performance ratio here is much better than what you would do with traditional stacks. So that is one part of the technology stack that we have chosen, which makes that useful. And the other things that we have done around auto-scaling and optimal management of clusters and automating that and moving that away from manual control to more optimal machine control also helps reduce increased utilization. I would put it that way. So you can get more done within the same amount of work. Let's help some customers kind of have that aha moment where they really get it. I want you to help them understand just what some of these solutions could make possible. Give us just a scenario that you've seen or just one that potentially you could see that your solution makes something happen that really before now wasn't possible. So there are a lot of scenarios like this. I will give you a very simple example. This is something that we demo as well. A lot of customers have Twitter data and suppose you want to do sentiment analysis on Twitter data, in the past you would have to look at very, there are two characteristics here. One is just sentiment analysis, which is completely unstructured data. These are like tweets from people and what people are talking about. The second aspect is there's a lot of data. A solution like us can do that very easily. You know, in Hadoop and Hai we have operators which help you to do sentiment analysis and you combine that with the raw power of the platform. You can do that very easily. Another example is a lot of our ad tech companies, some of our clients in ad tech world who are collecting data from ad impressions shown across various properties on the web also have a lot of big data sets and imagine trying to process those data sets or developing models on those data sets in a very traditional stack. It doesn't scale and doesn't work and with a technology like us they can do that and before us there were barriers that they had to develop expertise in these technologies whereas that barrier also is taken away. So you can just use this turnkey platform and then start using it. So they can just run with it and that manpower that previously was required, they can have them on a different project and focus on their core business. Absolutely, that is what our basic USB is. Pretty exciting. Let's talk about just kind of your background. When you were working on some of these technologies back at Facebook, what was your inspiration? What were you thinking when you were working on some of these things? So we actually started the Hive project at Facebook with me and my co-founder. So I'll tell you about what the inspiration was. The inspiration from day one was how do you bring, so let me backtrack a bit. So around that time Hadoop had just emerged and now we had at our hands this extremely powerful computing platform. However, we also had a problem that most people didn't know how to write MapReduce. So from day one our solution was like how do we bring this power to the masses? SQL was the first step, so we did Hive and that expanded out the use cases. Then we took it further, two steps forward, saying, okay, apart from Hive, let's develop all these tools around this infrastructure, which makes it even easier for people to consume. And that got us going really well. By the time we left, we had about 25 petabytes of compressed data that was being managed by those clusters. We had lots of, you know, almost 25% of the company hitting those clusters. And that was really successful. It helped make that company really data-driven as opposed to not being data-driven. And this is exactly what we want to do for other businesses. By doing this in a turnkey manner, in Cubol, you know, building a very similar stack on Cubol, on the cloud, we enable most businesses to become more data-driven because it becomes easy for them now to use the power of these platforms, whereas it was not previously. Now I'm confident, just because of your background and the way that you helped develop Hive, that you in particular probably have a good sense or you have your finger on the pulse of this industry and where it's moving. I would imagine you might even be working on some solutions that are beyond where we are right now. What excites you? What do you see in the future as far as trends and where is all this going? I think there are phenomenal trends going on. You know, I've followed this, the Hadoop ecosystems forever. And you know, every year I see lot more adoption, lot more people talking about it. So, the biggest trend to me is how do you democratize this technology? To me, it's still a very, you know, technology which is, people are now familiar with it, but they are probably not comfortable using it. So the biggest trend is, you know, how do you democratize this technology? And democratization comes through two ways. One is easier interfaces. So now you see a lot of people putting SQL on top of Hadoop. We did that with Hive and a lot of other people are trying to do it. Democratization also comes with expanding the use cases. So now people are talking about speed and you know, how can we make it faster and things like that. Fantastic, that's some great information here. For folks that just want to tune in and get more information from you, what's the best way to contact your company or get in touch? So, you can register for Q-Ball. It's at www.cuball.com. You can also call it as at Hadoop Help. You know, I have the number, let me just, so we just got this number. 1855 Hadoop Help, that's us. And you know, you can, you know, if you want to use a stack, you can just register for a free trial. You also have a free tier, which you can use if your usage is under a certain amount. But either way of the, you know, either of those ways, you can get introduced to the company and start playing around with the software. Fantastic, 855 Hadoop Help. Winston Edmondson here, Studio B, thanks for your time. Signing out. Thank you.