 Live from the Fairmont Hotel in San Jose, California, it's theCUBE at Big Data SV 2015. Hey welcome back everyone, you are watching theCUBE live in Silicon Valley for Big Data SV, our exclusive coverage of Big Data Week here in Silicon Valley, I'm John Furrier, this is theCUBE where we extract the city with the noise, my co is Jeff Kelly, Big Data Analyst, our next guest is Rishi Yadav, he's the CEO of InfoObjects. Welcome back to theCUBE, I appreciate having you back on, and you got a book here, welcome back, what do you show us the book there? Yeah so this is the book I've written on Spark, Spark cookbook, so Apache Spark being a new and very exciting and very complicated technology, and most of the stuff which actual developers want to do on Spark, they find it very tough. So we started with having this cookbook on our website and put some recipes there and the kind of traction we were getting was amazing, like one recipe would get like 1,000 hits in a month, right? So you say we'll just write a book because it's so popular. Exactly. Well you know that means now you only get appearance fees and pay you to come on theCUBE and keynote speeches, that's what you do when you write a book, no seriously, there's demand for Spark, obviously it's hot right now. Yeah so and it's a very interesting but complicated technology and so that's what you've done, so this is the cookbook in which you can run these recipes and these recipes will work fine on your standalone cluster on one machine and the same recipe will work fine on a thousand node cluster. Before we get into this part, so I want to ask you more about what's going on in the industry and the company, obviously you guys are very strong, you're a modern company, heavily involved in social media, we see you on Twitter and the crowd chat, having some good time sharing, obviously now taking some good content, making out of book, obviously you guys doing some great work. Talk about the company update, you guys were recently voted best place to work, small business, that's a huge accomplishment. Talk about the culture, what's going on with the company, status, traction and so on and so forth. Yeah so we have been in the business from last nine years, always been focused on open source from last few years, we have been focusing on big data. Last one year our focus has been primarily on Spark and that's been a blessing, I mean the kind of traction we are getting is amazing. Yeah we are kind of honored and humbled that last year we got the number one best place to work by Silicon Valley business general and San Francisco business time. So we are building this. It's a compliment also because it's very competitive, market place. It is a very competitive market place, especially a services company. That being a services company, we do what a regular services company does, we do end to end projects, we provide stuff, we do everything in between, but we do it with the technical expertise, right? So if at a client side where our consultants are working, if they have a technical problem, then the director will call me and say Rishi, this is the issue, what should we do about it? So that is the difference we have as a company. Being a consulting company and that's the gap we are filling in because there are solutions companies like Cloud Air and Hortonworks and then there are pure services companies. But we are working on this niche, so we are working on this well in which we are doing really, really well in the big data space. Talk about Spark a little bit more. So obviously it's getting a lot of attention. Break it down for us, what is really distillate? What is the real value that Spark is bringing to the big data table? So what happened was that from last two strata also, every time folks talked about the ROI issue that everybody has created this big data lake for the lack of better word. And so everybody has data there, but how do they make use of the data? If you use Hive, it's going to take half an hour to get a simple query to run. Now that's very frustrating. So what Spark has done is that it has brought the latency to sub-second level. So a regular query, even a very complex query can run within a few seconds time, which is amazing. The other thing which Spark has done is that it has, what JVM did 20 years back, that it has provided one single platform and everything else has become a library. So earlier you had Hive for say a SQL, then you have Mahoud for machine learning, then you have got TwitterStorm for real-time analytics, and you have Giraffe for graph processing. Spark, just one platform and everything is a library. So that has made it very easy. So the kind of excitement we get, in fact, most of the companies who are contacting us, data scientists or the senior architect, they are the people. It's not the directors or the managers or the VPs. They are contacting because they know that the Spark is there, right? It can really enable them to get all the value out of big data because they are the ones who are the main sponsors of getting big data into the company, right? And then they say, okay, we need help from a company who actually can provide people, can provide expertise, and that's what starts for us. We'll talk about that. For all its potential and the power that it provides, very fast queries, really high performance, it's still fairly raw, it's still fairly young. I wouldn't necessarily use the word immature, but it's still fairly raw. So obviously people are looking to experts like yourselves to come in and help them with it. What are some of the areas that you specifically can help with that maybe a data scientist isn't necessarily prepared to tackle themselves? So mostly the companies come to us for three things. Number one is they simply say, I wish I want five Spark experts, right? Zane San Francisco or Los Angeles or wherever, right? Second thing is they say, okay, that we have this problem to solve, why don't you take it offshore and do it with your team? The third one, which actually, at which a lot of conversation starts is that they get stuck in an issue. And they say, okay, we have this cluster, and we want some help to optimize this cluster, okay? So let's get started with that. And once that is successful, then we'll see what else we can do with you guys, right? So that issue-based support is pretty interesting because Spark, so it's very powerful, but it's kind of complicated, especially, as you said, that the technology is still maturing. It's still the 1.2 version, but it's getting better every day. So in a business like yours, how do you stay, how do you keep your skills up to date with the latest and greatest? So Spark, it's Spark today, but it's going to be something else tomorrow. How do you go about in terms of training your staff, finding the right people, and then helping your customers by training them? How do you kind of stay abreast of what's kind of the latest and greatest and taking advantage of that and building your expertise? Yes, number one in my technical background, obviously, I'm a software developer from last 17 years. The second thing is that we have kept training at the forefront of the company strategy always, right? So we always had our training programs going on. So that's what helps me because I have to obviously make sure that we are updated about the technology for the training programs, and that also helps with the clients, right? Because then there's this who are always trained with the latest technology. Most of our training programs, they change on a monthly basis, right? Because technology has moved on so much, right? So you cannot have the same content again and again. The bigger companies cannot do it at such a fast fashion. So I mean, just because of their inertia. So that's another differentiator which we get. And talk about some of the, you know, both the opportunity, but the challenges of running a professional services firm. I mean, we know it's manpower intensive, but it can be very profitable as well. But you're also out there, you're trying to attract the same talent as everybody else out there. What's the environment from a competitive standpoint for a professional services firm like yours? So the biggest challenge is the, it's always comes down to the supply, right? So in our case, we have this machine in which we have these Spark experts, but the weight demand is increasing. As we were talking offline, I was telling that since last cube, 12 companies have contacted us and they're all inbound calls, right? That they want to work with us on our Spark expertise, right? So though we are churning out Spark experts as fast as possible, it is still not enough. Look at the demand. Well, it's good to have that kind of demand that's good from a business standpoint for sure. So talk about some of those customers. To the extent that you can, you know, what are some of the use cases, some of the applications they're building and really the ones that maybe you can tell us about the ones that are really moving the needle in terms of either generating more revenue, finding new lines of business. What are some of the more transformational applications? So let me take a few. So we are working with one of the biggest automobile makers in the world. And what they are doing is that they are moving their old EDW operations to big data. So all the use cases they've already had, right? They are moving that to big data. We are working with a gaming company which have basically event-based data and that data they want to get insights out of that data. A couple of companies, one company we talked few days back and what they are doing is that they have a time series data which they have the sensor data. Sensor data is always a big use case and big data and they want to make sense out of that data. So the company is old enough, they have data old enough but now with the big data technology they are thinking, well, what extra we can do with the big data technology? Well you mentioned your first example, there was moving some of the EDW workloads over to Hadoop and we certainly saw, seeing that happen kind of the early phase of the market. That was kind of the long-hanging fruit I think. Are you seeing any kind of transition or move to kind of the next phase of the market where you focus more on whether it's revenue generated as more of the net new applications versus moving kind of the old to the new? Is it, are you seeing any shift there? Are we at that point where we're starting to see more of these new types of applications that you couldn't do in the old world? So the transition is definitely the bigger part of the business right now but we do get some new use cases also. So one company we were talking to and they were building their completely new graph application so they had this old graph which they started to work on Spark from the very start. So I think what's happening is whatever new use cases they are getting they are definitely working on Spark and big data technology but at the same time, well, is it leading to completely new use cases right now? It's tough to say either way. Is the early adopters and the mainstream market dynamic with Spark is interesting right now? So we're seeing Spark has been around with early adopters and it's been explosive in terms of the results. People were pretty much blown away by the capabilities. What's going on in the mainstream customer base with what's the big innovation? What gets their attention? Is it the performance? Is it the capabilities, both? I think it's the performance and I'll tell you why. So as I said that there are these four libraries which Spark has, the machine learning library, the graph library, the streaming library and the SQL. I don't see clients using more than one library at once which is definitely what you can do with Spark. So what attracts them most is the low latency that now they can get their queries run in within a few seconds. And that's a big thing, I mean, because now you have your terabytes of the data and now you can run the queries on this data at the same speed as you are doing with your regular databases. You were quoted on theCUBE last year, I'm reading some commentary on the crowd chat. You quoted saying trends in big data survive everyone has to have a customer focus and customer turnaround. Customer support will play a key role in generating revenue. Obviously customer support is big. Talk about that, amplify that further. I mean, is it more now than ever with the big data? Customer support seems to be an area where one goes and that's in terms of like sentiment analysis, predictive capabilities. What do you mean by that customer support piece? It's a good question. So what happens is, as I said, that on one side there are solutions companies, whether you're talking about CloudEra, Hot Unworks, MAPR or you're talking about Databricks which is supporting Spark, right? They are doing a great work in taking the technology forward. But there are not many companies which provide actual foot soldiers. I mean, the actual developers, actual architects, actual data engineers and data scientists who can go at the client side and do the actual work for them. So they see a lot of capability in the big data technologies, but they need actual people or the actual companies like us, right? Now, whether they do it on site or we take it offshore or we use a hybrid model or we provide issue-based support. So the companies which are there to help them navigate through all the complexities of the big data. So commoditization has been a big theme than Jeff and our other guests have been talking about, moving up the stack, open source, clearly commoditization opportunity. But how do you work with customers to have the legacy stacks, the old BI tools, all the old stuff? Is there a balance? Is there a roadmap to move to the modern era? I mean, how do you guys view that piece? So customers are moving very fast. I mean, it's much faster than I thought they would move. In fact, whatever new use cases that they are using, they are definitely moving it to the big data. They are not even talking about doing it in the old systems, old ETL systems or all EDW systems. But at the same time, there is a lot of focus to move data or the use cases from the old systems. And forget about EDW, there's a company we're working with and they had a lot of jobs in MapReduce because now MapReduce is also legacy if you talk about it, that's two years old, two years, like 20 years in big data space, right? So now, there are a lot of customers who say that we have these jobs which are running in MapReduce and let's move the jobs to Spark. Earlier, they used to talk about moving the jobs to Hive and Pig. So even that is happening. So the legacy stuff, yes. So the definition of legacy has changed with time. Yeah, this market is moving so fast, right? The two years is now old, right? Oh, MapReduce, I gotta rewrite those jobs now for Spark because it's just too slow. And that's the old paradigm, the batch paradigm. John, I think that's what we're seeing is what I was alluding to, kind of moving from this phase one to phase two from kind of the batch offload to more real time and ultimately moving to what we're calling inline analytics, essentially operationalizing all these insights so you can make an offer to the customer in real time while they're still there based on big data analytics working in the background versus, hey, I've got a nice dashboard that tells me what I should have offered them and maybe I can build that into my processes going forward but to actually capture that in real time. Yeah, online machine learning, yeah. So there's some surveys out about coming out that we've been talking about on theCUBE in terms of top trends for Stratoconference Hadoop World and Big Data SV. Top trends are Hadoop and analytics are the top two. And then the rest are BI, NoSQL, Data and Grace Security, Big Data, Packaging, Spark, SQL, VC, funding, in memory, data cleaning, streaming, and then other. The other category is third. So this other technology that's hot. So what is that other? I mean, is that the legacy piece? I mean, this is where there's a lot of service opportunities out there. Dave and I always talk about services angle and then Jeff and I talk about the practitioners value. That other, I mean, basically Hadoop and analytics and even BI is lower than other. What's in that other category? Let's get your guys' perspective on that. Let me think. In our case, most of the time, I think EDW is probably the one which kind of encompasses everything because we're talking about the OLAP space, not the OLTP space. But most of the time, the needs we get are about the complete rewrite that they give us the requirements and they say that move the data. It could also be the RDBMS. I'm not sure if you have covered, did you cover RDBMS in those keywords? Yeah, sorry. So I think the other is, so there's a lot of, so one use case which we see with a lot of customers is that they have data in their relational databases and they would scoop that into Hadoop cluster. So they would just move the data from different relational databases into Hadoop cluster, mostly in the high warehouse. And from there, then they start, whether they want to run a regular high on it or they want to run the Spark on it or they want to run Impala on it. So in fact, I was kind of impressed when with all the power of Spark being there, a lot of clients found Impala very fascinating. So I think Clouder has done an amazing job there. Yeah, certainly the performance is there. They've produced some numbers. We appreciate you coming on theCUBE and what do you think about the crowd chat? Yes, I thought that was pretty solid. You've got some good comments in there. The crowd is pretty hot on big data and you guys are doing. Absolutely and I like the multidimensional part of the crowd chat in which yeah, so from different tangents are going and different thought processes are going and that was pretty good. That was pretty exploratory that looking at it from multiple facets. I read the transcript last night. I'm like, wow, this is pretty much it. I mean, this was such a great content production and you guys doing great job. So the final plug for the book, what about the book again? When's that coming out? How do they get access to it on the website? Yes, so book is, so book is. Book up again. Yes, so okay. So here's the book and. Spark cookbook. Spark cookbook. So this is going to have recipes about everything in Spark all the way from the installation to the administration to all the four libraries. We'll have a lot of tons of use cases which you can play with and the book is scheduled to be released in June. Most probably it will be released before because almost ready. And but it's by packet publishing. So anybody interested, they can go to the packet publishing website to get more information about the book. And you guys will be here at the show on the ground booth. Everything else was updated for the show activities. Yeah, so we are on the floor at booth 525. So yes, anybody who's, everybody who's at the conference, please visit us. Appreciate you. Thanks for coming on Cube. Really appreciate it. This is theCUBE. We'll be right back with our next guest after this short break. We're live in Silicon Valley. It's theCUBE. I'm John Furrier with Jeff Kelly. We'll be right back.