 Live from New York City, it's theCUBE at Big Data NYC 2014. Brought to you by headline sponsor, Juan Disco, with support from EMC, Mark Logic and TerraData. Now, here is your host, Dave Vellante. Welcome back to New York City, everybody. This is theCUBE, our live mobile studio. We go out to the events. We extract the signal from the noise. We're here, camped out at the Hilton Times Square. A lot of action going down at the Javits Center. We are shuttling guests back and forth. Ron Bodkin is here as the president of Think Big Analytics. Now, a TerraData company. First of all, Ron, congratulations on the acquisition and welcome back to theCUBE. Well, thank you, Dave. We're certainly proud of what the whole team was able to accomplish. It's a testament to the great work they've done over the years that TerraData were so excited about what we were doing at Think Big and saw the opportunity to create one plus one equals five by integrating us in. Well, you and I met quite some time ago in the early days of the whole Hadoop and Big Data movement. And we've always said services is where the rubber meets the road for customer value. And the research that we've done shows that nearly half the spending in Big Data is around services. It's hard. But you guys, your job is to simplify that, get to business value faster. So talk a little bit about that business and then I want to get into the acquisition. Absolutely, yeah. So our core business has always been around helping customers get measurable value from their data and taking advantage of the open source Big Data platforms that really change the game in a number of ways. But to your point, it's been hard for many organizations to adopt these technologies because they involve some changes in thinking, right? That we have really mature, well understood ecosystem around data warehousing, integrated warehousing, ETL, BI. And changing that game, say, hey, there's new ways of doing analytics. We can work with data sets at the deep behavioral level. Those types of things challenge a lot of assumptions. And candidly over the last 10 years, a lot of organizations had settled into kind of operating and run and more of a cost center mentality around IT rather than a partner with the business to drive innovation, right? So a lot of why organizations are sometimes, I think, struggling with adopting Big Data is because they need a lot of help on putting that creative spark back in and building that partnership between the business and the technology organizations. So the work that you were doing, or are doing at Think Big Analytics, sort of pre-acquisition, which is still the work that you're doing. Yeah. And that's a lot that's changed. But I wonder if you could describe it specifically in terms of the relationship with the existing data infrastructure generally and specifically the EDW. Absolutely. So just to highlight the point you made, it is important that, you know, as Think Big at Teradata Company, we maintain our vendor-neutral independence, right? One of the things that Teradata is excited about is that we can work with customers using all kinds of databases, using any leading open-source technology, right, all the different distributions. So that, you know, our stance is we are out there to help the customer, right? And so we're not compensated on the products that Teradata provides. You know, we're not being incended to sell into Teradata's base exclusively, just the opposite. You know, I think it's important our customers told Teradata. No product agenda. Right, they want us to continue to do work with customers that don't use the Teradata warehouse, for example, which is an important part of maintaining our credibility as a vendor-neutral company that has a broad perspective. But to your original question, you know, it's absolutely true that these new technologies, we've always believed that the real opportunity around open-source big data is net new analytics on new data that you couldn't achieve before. You know, we've always felt that the business case for rip and replace or, you know, cost optimization wasn't that great and there were much more exciting things to do with these technologies, right? So I think the industry has started to come along, right, that the notion of a data reservoir as a place where you can hold a wider variety of data breaking down silos and do things with data that's not in a structured format or as simple a format, you know, to allow connecting a lot of that information, doing deep analytics, being able to enable you to work with raw data in quite a comprehensive way. That has a lot of synergy than with the downstream data warehousing world where when you do process and refine the data, you of course need to have the ability to govern it, but ultimately you continue to have the need to support key business decisions around well-governed data sets. The other X factor though that's happening is you're starting to see more and more of engaging those larger data sets right into operational analytics, which means things like using machine learning to drive recommendations and next best action and customer interactions, whereas in past the motion had tended to be more, people either had really small scale statistical models, they trained in a single machine with limited data or they had humans looking at data through analytic systems and human in the loop modified some rules, right? So you would see that in, you know, prior to being the founding CEO of Think Big, I was at a company called Quantcast and we did, innovated around building look-alike models for ads, right? Prior to that, most advertising had been done in a similar way, people manually encoded rules for what ads to target where and now of course the robots have taken over, right? So I think that's in that industry and I think that's the trend in a lot of places, more automation of analytics using large scale data. So that's interesting. Wow, I have so many questions about what you just said but so let me go back to something you said initially which was, you know, sort of the efficiencies, cost savings is somewhat mundane. There's all kinds of other things like you just mentioned the ad tech, I mean bringing, you know, analytic systems and transaction systems together in a way and letting machines make decisions that too fast for humans, you know, repricing, you know, very, very quickly. That's, you know, new forms of business value but I have to say, I'm surprised actually because we get how many interviewees I've talked to that, you know, guests on theCUBE and others just on social media that talking about massive cost savings, you know, through Hadoop and my question to you is, is it really cost savings in other words, they're taking cost out of the business or is it more they're being able to do things that they could not have done cost effectively previously? You know, so I think that the most exciting use cases are definitely around doing new things that were cost infeasible, right? That you would never go in and buy 100 machines, you know, of your traditional analytic software to do the calculation but you now can do that processing in Hadoop, right? That type of use case. We think that's a lot more exciting, right? As well as just the flexibility to work with less structured data and have faster iterations and learning out of a large quantity of data finding the signal in the noise as you said earlier, right? That ability is really where the true value and the transformative effect of big data is, right? The big data effect in the economy is gonna come from having more data, driving smarter decisions and driving to, you know, faster growth. That's where the big impact's gonna be. There are cost savings, right? Some companies do say, look, we wanna take our least favorite database vendor up in Redwood Shores and we wanna find a way to use a little bit of less of their software, offload some bloat from those systems and what have you but usually what we find is that when people try to do that, they discover that it's a little more complex to migrate stuff over from an existing environment and the business case isn't nearly as compelling, right? So that's where we tend to push customers to low hanging fruit where there's relatively low complexity and high value and it's usually around data that you haven't been working with very effectively, maybe marrying up a couple of data sets and breaking down silos of information where, you know, we see that as a common pattern, right? Maybe in marketing, people are working with data over here in a siloed analytics tool for web and another one for email and another one for search, right? And so putting some of those data sets together along with the customer profile, suddenly you have integrated analytics that really drive something more compelling. You know, Ron, one of the things we're seeing in the Wikibon community is to see these big data projects popping up all over the place and in many organizations, nobody's really paying attention to the governance piece of it. You mentioned governance before, I'm wondering how much work you're doing there, what are you seeing there? We're seeing the emergence of the role of chief data officer, particularly in financial services, healthcare and government. What are you seeing there? Is it, I know it's a spectrum, but you got the spectrum is kind of spinning up, see what happens, don't worry about governance, we'll figure it out later. Somebody will solve that problem versus, you know, some of the more prescriptive organizations are saying, okay, we're going to put in a chief data officer, we're going to worry about governance, we're going to set up frameworks to manage data quality. What are you seeing along that spectrum? It's a great question, you know, but I want to touch on the first thing you noted about big data projects springing up all over. And I think one of the things that I think is important is, you know, as we're talking about, hey, these innovative new approaches, high value analytics that, you know, we're rapidly building, think big, with the backing of Teradata now behind us. And so, you know, we're aggressively hiring for people that want to work on these exciting innovative projects. So, you know, data scientists, data engineers, people working in product R&D for building up some of our capabilities around reusable solution IP that complements and sits on top of the open source platforms. So, you know, definitely something we'll probably touch on a little bit more later on, but just for those listening, do keep us in mind, we're aggressively hiring and want to hire the top talent to build this next generation of great solutions. Now, you asked about governance and absolutely, I think as organizations are getting more serious about applying big data and these systems are going into production are being used to make important decisions and they're being used to drive operational analytics that are, you know, making recommendations or being used to automate decisions. It becomes important that you have a level of governance over that data, right? And the other thing you see is, I can tell you, even in the early days at Quantcast, we'd find that having some, you know, control the metadata over what you're doing in your big data environment is really important, right? That in today's parlance, you know, there really are two kinds of data links as the data, as they grow. You can have one where there's some governance and control and management and data reservoir or you can have a dumping ground that's a data swamp, right? And a lot of early organizations adopting Hadoop ran into that data swamp trap where if they didn't have tools for traceability and understanding the data sets and curation of them, you just get this overrun mass of many copies of data. Nobody knows what's authoritative, right? I mean, it's like, in the short term, the flexibility of not having to have a lot of schema and structure on the data lets you move quickly, but you still need to have a way of knowing what's done and taking things from an unrefined raw form progressively into a more governed and controlled form, right? So to me, the magic of big data is not, you know, that you don't need governance. It's that you can incrementally govern the data as you apply more structure, right? So you can be agile and start with raw data and have a really smart data scientist figuring out something interesting and it graduates from them into a power analyst and then eventually it gets more into the broader hands of a wider community of people who can work with data that's, they need more governed, curated data. So are you saying it's better to have a data swamp than no data at all? That's a good question. I haven't spent a lot of time. There's better alternatives to either, I think. I think you need data and I think you need to have some governance for sure. Yeah, okay, so you're not advising people, okay, just if you can't afford it, just go capture it and figure it out later. That could cause some problems. Maybe you really need to think about it. There's good practice around, you know, some lightweight metadata management annotation of data so that you can record it, capture it and know enough about it that you can make sense of it, right, so you need to at least have some idea of the providence, where data sets came from, right? And then also some notion of ownership of who's responsible for this data set anyhow, right? So any data that's in production. I mean, it's fine to have scratch data sets that are in a sandbox that a single, you know, individual or small group are working with, but if it's gonna be a long-lived asset, if it's gonna be regularly put into a data lake, then you really wanna have enough knowledge about it so you have some accountability and predictability around what is this data anyway. So definitely, sorry to get esoteric, but the difference between the data swamp and the data lake is sort of the metadata, the rules around the metadata, some level of governance, and I would think the ability to scale that as your data scales. And if you have that, then it's not a swamp. That's right, so you have some, you have, that's quite right, you know, that you have that ability to have some basic assertions about the data and have some level of confidence and, you know, depending on what you're doing, sometimes having the ability to go, and the ideal, you wanna be able to take data that's lightly governed with some basic information about it and increasingly assert more and more things about it so that you can have a more curated data set, right, that ultimately ends up with the same level of curation that may well be published into a data warehouse. But well, you have to refine it from lightly governed to really fully understood and carefully versioned, right? So that whole process, you know, you wanna be able to do all those stages of refinement and governing of data in the big data environment. So at four o'clock today we have our capital markets event, bunch of Wall Street guys coming in and we're gonna try to help them squint through how to play big data because there aren't a lot of, you know, pure play big data public companies and so people are wondering, well, how do I actually invest in big data if I'm not a VC or, you know, and when I first met Peter Goldmacher, who was a common analyst, he put forth the premise that big data practitioners are gonna create more value than the supply side than the vendors, so look there for investment angles. So I wanna explore that with you a little bit because we've been talking about solving some hard problems but the flip side of that is you've got people that are really good at big data and they're driving new business models. What do you think of that premise and can you think of maybe even without naming names, just some examples of organizations that on the buy side, you know, adopters of big data technology that are transforming businesses that are good investments, for example. Yeah, well, definitely, I think there's a range, right? That there's been a whole class of new companies, startups, some of them are giant now, but in places like social media or advertising technology or, you know, people that are using data to disrupt value chains and create massive new opportunities, right? But if you look at organizations like Uber or Airbnb, you know, you say, well, look, they're fundamentally using data in a deep way to drive their business model, to enable their business model, so you just don't see an innovative fast growth company changing a value chain today that doesn't have data science and deep use of data is integral to the way they're building value. But, you know, to some extent, those are the poster children, they're obvious because they're new and they're leveraging a lot of Silicon Valley talent to do big data well. But, you know, they're the tip of the iceberg in the sense that there's a lot more of the economy that is, you know, being done by companies that are more established. Right, and we see in our customer base, innovators in all kinds of industries. We see, you know, innovative applications in asset management and in banking and insurance, whether it's really driving deeper understanding of customers and being able to better service the customer, you know, more effectively provide, you know, reach the customer and provide what they need. You know, we see being, you know, things like claims processing and insurance and being able to be more efficient and intelligent about that. Logistics, right, retail, absolutely. The existing brick and mortar guys have a huge opportunity. And in high tech manufacturing, right, as a tip of the iceberg, again, for the whole manufacturing sector, right, we see meaningful use cases around large-scale test data sets that couldn't be worked with with traditional systems. Suddenly, you can drive faster time to market with better yields and, you know, innovation, taking time for scarce product engineers and repurposing it from hunting down data sets and doing reactive analysis to innovating on product and being proactive in figuring out what's really going on and advancing the state of the art. And we see it on the customer service side, you know, more and more devices are connected. So internet of things, connected devices, using those data sets to drive better service and strategic understanding like using the data of connected products to inform how to improve those products, where to invest in QA, so data-driven product management. And so those are all areas where, you know, we see innovative companies really using these techniques to create a ton of value. You mentioned internet of things. So are you doing work in that area? What are you seeing? I mean, what is it? Or we should be thinking about, you know, home thermostats, is it, you know, GE's industrial internet, IBM's smarter planet, you know, help us squint through IoT. Yeah, well certainly, like so many trending terms, you know, there's, there are many definitions, right? I mean, I think broadly to me internet of things is about smart connected products, right? So anytime you've got a product that's connected on the internet and has, is running some form of software, to me that's an interesting internet of things application. You know, there's certain cases where people have dumb connected products that just have like a sensor. I think those are a lot less compelling, frankly, than when you've got smart products that can react in some way and are more complex, right? So that does open up quite a range of applications from, you know, industry applications like turbines and jet engines through to, you know, hardware products that systems vendors, you know, storage and servers and so forth are providing through to, you know, mobile devices, you know, you know, your connected watch and fitness units, right? So there's a huge range, you know, items that are more in a business to business context like having RFID sensors, right? So there's a huge range of, now what we've seen though is that the first movers have tended to be tech companies that already have high value smart connected products where they've already invested in capturing the data and collecting it because usually you can only start to get into interesting analytics once you've just figured out you wanna capture data and you have the ability to capture it, right? Then it starts to get interesting. So when you look at what a lot of people talk about is internet of things, which is less around those sort of large value B2B tech products and they think about consumer things, you know, those are, those applications are a lot more nascent because mostly they're still working on the standards for how to collect the data and you know, until you kind of have a well-established infrastructure for collecting data it's hard to do interesting analytics on it, right? So I think there will be followers that that kind of internet of things analytics will be a follower from the starting point which we've seen to be more these high tech products. Interesting, let's talk about the acquisition. So what led to the acquisition? Talk about it from your angle and if you can talk about it from terror data's angle. Absolutely, you know, certainly from our side we felt that there was such an opportunity to take the innovative kind of projects and work we're doing and really blow them up and do it in a much bigger scale then to create a lot more opportunities, you know for people to do this amazing kind of work and develop their careers to help more customers be successful, right? We were having to turn customers away because we just didn't have the resources to scale quickly enough, you know and we felt that to climb the stack and add value there was a lot of investment opportunity to build more reusable components to standardize the best practices and patterns we've been seeing across customers repeatedly, right? So all of those things we felt there was such an opportunity to have a big impact and to really drive that key how do you help enterprises derive value from big data? How can we really take the leading business and scale it and have a big impact? We felt that there was no partner we could pick that would be better than Teradata with as a global leader in analytics and Teradata has always been prided itself on having industry leading data warehouse technology, deep depth in analytics and a strong services capability to support customers succeeding on that. So the depth of industry knowledge and how to operationalize analytics and how to really be successful in analytics was very attractive, you know from Teradata's standpoint, you know they clearly, we now clearly see a ton of need to integrate these new open source technologies Hadoop and NoSQL and so forth in with the existing assets, you know integrated data warehouse from Teradata, Astor data or sorry, Astor for discovery appliance not supposed to be Astor data, just Astor. Hold the habit style. I just learned that, so that was new for me. But you know, the point there being that customers really needed to have that unified architecture have that ability to integrate Hadoop and NoSQL in with and streaming you know, Spark streaming and storm and so forth into the architectures. And they had, you know there's a lot of demand for Teradata to help with that. Right, so they had a strong desire we had a strong desire to really add that capability. You know and Teradata did some work to try to build it themselves prior to acquiring us and really struggle like, you know, a lot of organizations. We've seen there've been very few people have been able to really succeed at scaling up big data services because there's sort of two options. You know Teradata wouldn't compromise on quality but they found it really hard to build a large enough team. Right, they were growing their team but the market demand was vastly outstripping. Right, because they didn't have the ability to take people to write aptitudes and train them and ramp them up to be successful. Which is one of the key reasons they bought us. Right, is that we could take people that are interested in big data and teach them what they need to be successful working with a team of experienced colleagues. So we had a critical mass of being able to scale and make people successful who were interested in getting into big data. Right, which is pretty neat. It's a skills injection really. Yeah, it's skills injection and it's being supported and having those opportunities. Right, and that's one of the reasons why it's so unique to work at ThinkBit. So that's one thing that was really important was that ability to ramp up people to scale. Right, and another was they looked, ThinkBig's the big data leader. Right, that is a pure play really helping in big data services. They felt that we were by far the best. You know, at one point in discussions, you know, our executive sponsor, Dan Harrington, said to me, how come there's nobody else like you out there? Right, so they looked and looked and it was like, we're pretty unique. Right, so that was really important for them. You know, and the third was we had an onshore solution center. So a way of, you know, in our Salt Lake City office, South Jordan, we've got great engineers and scientists who can do sustaining engineering. So we view that as much better aligned with onshore business. So a great partner, time zone friendly, closer cultural affinity at a modest premium through remote offshore. You know, we've seen that a lot of organizations have struggled trying to make remote offshore models work for this new space where getting business and technology alignments so important and things are moving so fast. So that was an important factor. And then last was the solution IP we have built up in our R&D team and the patterns we've already seen that we have lined up to drive repeatability, right? So there's the software components that we have developed and are developing to accelerate time to value for our customers. It's interesting what you're saying about why aren't there more people like you out there? Well, part of the reason is that people sometimes look down, VCs in particular look down on services as a business model that's got, you know, it doesn't have the software and marginal economics and everybody wants that. And everybody says, oh, well, it's going to be Accenture or any young IBM, Deloitte will own that anyway and now we don't want to invest in it. But you could have raised more outside money. Sure. And you chose not to. Talk about that a little bit. I mean, the big data is a little bubble-licious right now. But you chose to exit at this point versus raising more money. Talk about the reason why. I wouldn't call it exit so much That's an exit though. I mean, we sold the company, right? This is liquidity, but we're not exiting at all. We're doubling down to build the organization inside of tech, right? So, you know, we acquired terror data. Seriously, what we did is- Your VCs got to exit. Yeah, that's right. Our investors got an exit. But really for us, it was about what's the way to have that right impact that we felt that the ability to have access to customers, to have, you know, the level of investment in the synergy with the business was critical because we feel like this is a unique time in the market where you can really do something, we can do something and have a major impact on an accelerated adoption and build on what's a great business, right? So we felt that was a great thing in terms of the impact on what we could do versus going out and raising outside money, which, you know, typically what we were seeing is there were a little bit of a gap. The first of all is the market for investing in services is not like the market for product, as you alluded to, and so services investors sort of, we were in this tween phase where we were growing too fast for their models because you just don't see companies growing 100% a year in services, right? So they don't know how to understand and value us. And so we were at this weird stage where, you know, we would have to wait till we kind of grew and slowed down our growth to become, you know, to fit their models. Which is not what you wanted to do. But this is not at all what we wanted to do. Hey, let's slow down so we can fit the model. That didn't make sense to us, right? Right, so really more strategic capital. And is it more patient or not necessarily? You know, I mean, I think strategic patience an interesting word, right? I mean, we're all really impatient because this market is moving fast. And there's so much opportunity that we're being drafted by our customers to do more and more, right? So it's an exciting time, right? You can't be patient in this space, right? So what you're doing with the funds and money with the capital is people, right? I mean, really, that's the big gate. We're hiring a lot of great people and ramping them up, investing in the training. And we're also ramping up on investment in software components to drive repeatability efficiency. Yeah, we said that before, repeatability IP. All right, Ron, hey, thanks very much for coming to theCUBE. Great discussion, really appreciate your time. Dave, thank you. We've always enjoyed working with theCUBE and certainly appreciate the chance to be here. We appreciate all the support. All right, keep it right there, buddy. We'll be right back with our next guest. This is theCUBE. We're live from Big Data NYC, right back.