 People are in sessions right now, geeking out. We had keynotes here, Cassandra's the big data movement around NoSQL. I'm John Furrier, the founder of SiliconANGLE.com. This is theCUBE, our flagship telecast. We go out to the events, extract the signal from the noise, and share that with you. I'm joined by my co-host, Jeff Kelly, from Wikibon, our big data analyst, the number one analyst in the business for big data. Jeff, we're joined with John Ikrid. Ikrid. Ikrid, who is the big data R&D manager, which essentially you oversee all the big data for Accenture? Yep. Accenture. Labs, Accenture Labs, you're not like in the front lines doing management consulting, you're like geeking out in the labs. A little above. A little above, okay great. Well, no better place than Cassandra's summit to talk about big data. So we've been talking about all morning some of the themes I've been, you know, NoSQL hotness around structured databases. SQL's kind of broken and busted. That's a fallacy, we kind of debunked that. Relational databases aren't going away. NoSQL is hot and has a use case. Don't know about that. So talk about that. What do you think about that? Sure, do you agree or disagree? So I agree that structured data and transactional data challenges, that things that brought the relational database to existence still exist, they're not going away. And in fact, because we have all of these other data stores that allow us to take other kinds of data that aren't well stored by those platforms, but bring them to bear on problems, it actually makes the data stored in those relational structures much more valuable than it was before, which is why you're getting people asking questions like, how do I unite these worlds? And how do I bring my new unstructured data stores to bear on my problems along with the structured data that I've been collecting in my enterprise for a long time? I mean, it's kind of like, it's like a point of view of something that's happening in front of everyone. Everyone has a different angle on something. It's a tornado, yeah, it's a disruption's happening. We know that. There's an old way and a new way. So what's happening, what's great about the stuff that we're covering is two major mega trends happening. Converged infrastructure is actually happening. Solid-state, we talked about it earlier today, and big data. So there's old way and new way. Data warehouse business intelligence, Oracle. Now, new way is reconfiguration cash layers, all kinds of cool stuff. Can you elaborate just from your perspective as you dig into the trenches with the labs as you go talk to some of the essential top clients? You kind of, on two fences, you're in the playground of the innovation, the labs, and then you got to go out in the field and talk to people with real problems. Just share with us with that context. What you're seeing in this old way, new way context. Sure, so one of the interesting things about the way these technologies have really come onto the scene is that we had things like application servers beget JBoss, but there were proprietary versions that were established and had valuable businesses that were displaced by an open source technology. In this case, there was never sort of the proprietary winner technology for unstructured data or for massively scalable data stores. It actually came from engineers at companies solving problems that no package software manufacturer was solving for them. It creates a much different dynamic in the way that these technologies get introduced into an enterprise, even a big stodgy one that like the ones I work with, and how it grows up within them. So, any engineer, and turns out most engineers in the Valley make enough money, they're not really worried about spending 50 bucks on their own personal credit card to prove an architecture will work, right? And so that's a fundamentally different conversation with the CIO that's probably maybe perhaps used to a much more structured way of introducing technology into their organization. The thing about it is the value propositions that these technologies unlock are so compelling that eventually that resistance will give way. It's just a matter of making the value of introducing that technology compelling to the organization and understanding how are we going to introduce that over time? Because obviously they've spent hundreds of millions of dollars in a lot of cases on data management solutions around structured data. They don't want to abandon those investments and that's a reasonable stance to take. They also want to benefit from the new technologies and the new approaches. So, the question I get a lot is how do I marry these two worlds? And whether it's because it's a bottom up, one of their developers is saying, we got to use Cassandra. And with some shadow IT in the side. Exactly. Or top down, I got an iPad, make this work. Yeah, exactly. Or we have some forward thinking CIOs and CEOs, like Bezos who famously said, thou shalt not build a point to point interface, right? And everybody went, oh my God, this guy's going to fall, you're crazy. And he actually did fire a few people to try it after he told them not to and that made the message. And then now we have a successful services architecture at Amazon based on a bunch of distributed data stores. How much are you experiencing is the reluctance among more traditional organizations to adopt big data around the enterprise readiness factor that we hear a lot about? Whether it's scalability, uptime, we can't afford to take a chance on a new kind of untested technology. Where maybe over the last two or three years, how have you seen that conversation evolve? Both in the Cassandra, yes, but of course if you do. But all the other kind of big data approaches we're seeing out there. Absolutely. So the initial challenges were very much like, how do we know this is going to work at scale? And things like that. But it's pretty easy to point out that companies like Netflix are actually running this at scale and have been for quite some time. And it's like, well, if you don't think Twitter is web-scale enough for you, then you're probably the CIA or someone like that. So the technical challenges of the pure capabilities of the technologies are kind of eroding away. And where the conversation is more around the data management best practices and tools that grow up around that. So things like row-level access control or column-level access control are problems that if we solve them make it much more enterprise-friendly. We have ways around a lot of these things. And so it's more around manageability ops and things like that. You've seen a lot of development in the last year or so around Cassandra and other technologies to make them more easily managed. The joke at a Churchill Club event recently was like you don't need a team of 10 Stanford PhDs to manage your cluster because that's an awfully expensive operations team. So a lot around operations, but I think security and building the best practices around how you define and understand what's in there will undergo a revolution on top of the technologies that are driving it. Because I like, you know, I come from Accenture. We grew up as part of an accounting firm and we're really good at counting things. So we like transaction locks and things like that. In terms of our DNA, you know, introducing this new data world and organization like that is kind of challenging. Those techniques have a place for things like financial transactions, but the whole world isn't financial transactions. And what hasn't grown up around that is the best practices and toolings to make these data stores truly discoverable. Can I find the data in them? Manageable from terms of security, compliance, auditing, things like that, and then actionable, right? So that universe is growing up around them, but I think there's fewer and fewer doubts about the actual enterprise readiness of the technology at its core, right? So how does that play into the kind of the merging of the structured and unstructured world, where are people adapting some of the tool sets around traditional structure type, relational databases and trying to apply that in a way to the unstructured world? Kind of putting a sequel wrapper on top of it, for instance. What are you seeing there and how is that impacting how we're kind of bringing the two worlds together? Because ultimately that's what big data is all about is bringing multiple data sources together. Exactly, and it's interesting because on the one hand, we've got a lot of valuable data in those contextual systems like we talked about and what we want to do is we want to bring that to bear on the less structured data that these new stores are well suited for. At the end of the day to do anything useful, as you point out, you need to have them all in one place. So you get into things like system of record versus system of reference and synchronization schemes and so a lot of the data integration vendors, talent in the open source world, Informatica's are coming out with technologies that make it easier to move the data back and forth. You see things like the Hadoop connector is part of Green Plum as a way to make a Hadoop data store feel like a table to a relational database. The thing we don't want to have happen, there's a big distinction between accessing in a SQL like mode, right? Which is very powerful because many of us are familiar with SQL, we're comfortable working with it, we're used to it. SQL in and of itself is not a bad thing and at the end of the day when you're doing analytics the data isn't going to be in some structured form to pass it to an algorithm. The problem is demanding that you define that form when you design the database rather than when you're deciding to do the analytics. So it's fun and useful to be able to expose this data in SQL or SQL like manners because essentially at that time we need to give it structure to pass it to the algorithm anyway. What we don't want to do is go back to a world where that is a permanent feature of our data store limiting the other things we can do, right? So failure has always been a concept within Cassandra that's been core to avoid, right? Uptime and production has been the other thing. So in distributed, I would say distributed counters for example, Cassandra, that's hard to figure out. So they get props for their distributed nature. So talk about what's happening in that world relative to other approaches like Mongo, like HBase. I mean obviously HBase and Hadoop we have high availability issues with Nameno which are being solved with Yarn and some other issues. So what is the status of that? And then two, talk bigger picture after you talk about that around the market of distributed systems because we're watching and covering the data center in large enterprises that are changing. Data centers themselves are becoming distributed systems, operating systems if you will, software. Using big data and whether it's Splunk or anyone else you can actually analyze systems within the data center like power and cooling, like predictive analytics for failures, disk drives and so on and so forth. So it's a huge theoretical grid computing like paradigm that I mean that died but it comes back around. I mean converge infrastructure is happening right now. That was talked about eight years ago, right? So talk about one, the distributed counter kind of problem, how that's being fixed and then two, distributed concept and how that's proliferating. So it's interesting in financial services they've wanted to solve the distributed counter for a long time, right? You've got a trading desk in Japan and one in New York and markets open at the same time and a person in one location takes on risk and a person in another location takes on simultaneous risk. You wanna be able to understand that that's happening. There's a distributed counter problem, right? And so what they've used are massively distributed memory catches to do that up until now with dedicated network pipes between those locations that are fairly low latency relatively speaking to enable that to happen as fast as possible and to propagate that state to go out there. So some young trader doesn't take down Barclays, right? That's kind of a concept. Well, apparently it's not perfect yet, right? I just made it, I assume that was a problem. But it goes, Dan distributed counters. Every time you have a rogue trader there's sudden interest in distributed counters. But so there's a set of architectures around things like Gemfire and Memcached around using those distributed caching layers to do that. What I'm seeing is people wanting to cut that out and we saw that this morning with some questions in some of the earlier talks around can I now use the caching features of Cassandra rather than having a separate caching layer to do that? Can I use these counters rather than doing these counts in aggregate or micro-batch or however I'm doing it, right? So what is interesting is yes, absolutely distributed counter problem is very, very hard and people are very, very excited at the work that's going on in Cassandra, Storm and things like that as well. So certainly these solutions I think it's too early. We don't have a sort of a winner yet. A winner yet, but they're popping up because there's a demand for them. And when you think about these architectures you want things at the ingest point to do value-added things to the data on the way in because once you've stored it it's harder, right? And so if you have the context already it makes sense to do what you can with that context as you're bringing the data in. I think that's a fundamental point we should capture because that's really an epic statement. Taking care on the ingest point is a good time to trap and do something because then you're more intelligent on the store. Yeah, exactly. And it may not be. Versus the I'll send it, I'll park it somewhere and I'll run some algorithms on it do some data mining and then no one ever touches it, right? I mean it may be possible to reconstruct that state or the contextual state around that data after the fact but it's hard and if there's one thing that these data stores aren't good at it's joined so if we can avoid that by putting that context in on ingest it's very, very important. So here you have a background just depributed computing and distributed networks and network theory in general it's the same kind of theory it's kind of like once you know data structures you shouldn't be able to learn any programming language right as we were talking earlier with some of the alpha geeks from data stacks. That being said, what's going on in this marketplace that you could say wow, you know all this other stuff that was being talked about in theory is actually happening right now. Because normally that's usually what happens is that like a percentage of the paradigm that was being promoted over time ends up being an implement only a portion of it. So what would you point to on the distributed side whether it's grid computing, distributed networking, machine learning, meets AI, you know all that stuff's kind of going on. What do you think's happening now that's you can point your finger at and say it's really happening? So actually and I'll back up to the second half of the previous question too which is an interesting metric to look at is the average parking spaces in a new data center? So if you looked at a data center 10 years ago it might have a hundred and if you looked at a new one it might have five. Yet they've introduced a fundamentally more complex architecture under that right. They've distributed everything in many cases. So distributed systems are everywhere. There's a sprawl going on. And it's because of the fact that if you build one well and you build an architecture it allows you to get your data into one place and make it actionable. You can go into rapidly creating value added data products and bringing things to market. And I can go through LinkedIn and Twitter and Google and Facebook and you can talk through how they organize around a data platform and create a capability to rapidly do value added analytical services and propagate those to their business or their customers or their website or however they need to action on those insight generation machines essentially to deliver value to the business. What's interesting is you're seeing transformation in the organization of those businesses to be able to take advantage of that agility. One thing we like to say is don't design a system for real time analytics if you can't take real time action. So I think one of the biggest signposts that distributed systems and the capabilities that they bring to bear. And in particular the whole idea of scale out and the fact that that means you can push data into more and more decisions. The signpost for that is actually watching what's happening in the organizations that sit on top of that and how people are organizing to take advantage of that data and build the agility into the business that comes with that. So like predictive analytics is one, others are insights whether the querying is done by the business units, right? So the data's available, there's no burning bush of knowledge that query comes in. So the apps themselves are interfacing the data. So essentially yeah, I mean data is part of the development process. Yep. So digging a little bit more about so how are these business units really kind of evolving then to take advantage of this? From a structural perspective, what are they doing to change the way they work so that they can better take advantage of this? Sure. So one thing is essentially equipping their business analysts with the language of math at the highest level or everybody doesn't have to be able to decide between a support vector machine and a neural net and know about eigenvalues. But you do need to know how to ask the right questions of the analytic outputs to understand whether you can reasonably generalize something or things like that. We see organizations on the one hand moving from centralized sort of groups of I guess highly skilled quants who sit together and get fed a diet of very hard problems but very few to a model of distributing those people into actual business units. So they become familiar with the problems of that unit and things like that. So you're changing from a virtual community between your quantitative capability and your business users to actually physically sitting your quantitative capability with your business units and creating a virtual community for the quantitative people to talk to each other. And we can make jokes about the social characteristics of statutists if we want to now. But you see that sort of shift from a sort of dedicated deep analytics team that might be centralized and works on a few problems to bringing that out to the blind of business units, combining it with the people that are used to putting together ad hoc reports and looking at things for business owners who have questions and you get that constellation of capability and now you can start to answer some questions pretty quickly. If you've got the architecture underneath that supports that well, you can truly create some profound change. So yeah, and it really gives the quant really the ability to start to understand the domain a little bit more, which is critical when you're trying to bring to find insights into business problems that when you're talking about big data, you're talking about, you don't even know what questions to ask sometimes. So the more domain knowledge you have from a statistical perspective, from a quant perspective, the better they're going to do their job. Exactly, exactly. Now that said, there's a very, very interesting phenomenon. You look at somebody like Kaggle and talk to them about who wins their competitions. It's often actually somebody coming from a different perspective, like a different industry or a different domain and thinking of a solution that wouldn't be intuitive to those people. And so in a sense, that's what we're generalizing, right? When we bring sort of a generalized quant to that specific business problem, is you're getting a broad range of analytical capabilities and applying it to a smaller set of problems. Or the people who focus on those problems for a long time tend to grow sets of blinders and sort of agree on conventional methods. One of my favorite targets for sort of analytical scorn is the clustering or segmentation processes in a lot of marketing organizations. It's not that clustering is bad, but you've created large grain market segments because you had broadcast advertiser mechanisms. Now you have one-to-one advertising mechanisms. It's not that you don't need segments anymore, but some people still just think they're building segments. They haven't changed their view of how they can operate to match the new technology, capabilities that the technology enables. Others profoundly change the way they talk to their customers, right? And the world is replete with stories of one business succeeding and one business withering when a certain company in an industry figures it out first. Right? Right, absolutely. Very interesting. Okay, so John, thanks for coming on theCUBE. Really appreciate your insight. You're in a unique position. You get to talk about cool stuff and get in the weeds and get really techy and then go out in the field and talk to customers and understand what's going on in the marketplace. Thanks for your insight, for coming on theCUBE. Appreciate it. Thanks. We'll be right back on our next guest. This is SiliconANGLE.com. Thanks so much. Live in Cassandra Summit in Silicon Valley, we'll be right back.