 in San Francisco, it's theCUBE, covering Apache SparkMaker community event brought to you by IBM. Now, here are your hosts, John Walls and George Gilbert. And welcome back inside the Galvanized campus here in San Francisco as theCUBE continues our coverage of the Apache SparkMaker community event sponsored by IBM. The general session is tonight, you'll be able to see that live here by the way. We'll be streaming that keynote session and also bringing you interviews from throughout the week at the Spark Summit, which is tomorrow and Wednesday as well at the Hilton, just across town. But today we're in Galvanized, a really neat space, and George Gilbert is sharing it with me today, George. It's been a pleasure to have you riding shotgun here. And we are joined by John Akron, who's the CTO of the Silicon Valley Data Science Group. And John, we appreciate the time here. Thanks for being with us. Great to be here. Tell us about, first off, what you all do. I realize, as a consultant, you have a pretty wide portfolio, but you have some areas of core expertise to know. You're happy to share it with your client base. Absolutely, absolutely. So we are maybe starting a trend in naming Silicon companies after what they do. So we bring a Silicon Valley approach to doing data science for our customers. And what that means is we're about a 70-person consulting company these days, growing fast as a good Valley company does. And we work with companies about four basic ways. We help them answer their strategic question of what should they do with data. So most companies have a business agenda. They want better, deeper relationships with their customers. They want a more optimized and resilient supply chain. They want more optimized and resilient manufacturing, et cetera. And so for us, data strategy is okay. So what investments in technology, capability, people and process do you need to make to unlock that business opportunity with these kinds of data technologies and capabilities? So once they've made those kinds of decisions, lots of folks have that already in hand, we work in architecture advisory to answer the question of, okay, well if that's what you want to do and that means we need to build a recommender system, does that mean I need a Hadoop system with this or that or the other thing? So what architecture and what specific technologies will build that thing that unlocks that business advantage? And we call that architecture advisory. And then mostly what we do with customers is help them build those things out. So we have teams of engineers, data scientists, architects, designers and project managers that come from a range of Silicon Valley companies, consulting companies and places like that. And we work very closely with clients to then build those capabilities with them. So let's go back to the first piece, data strategy. How has that evolved over the past, even 12, 18 months with Spark, catching Spark if you will, with really kind of lining up the landscape in that regard. How has it changed for what you're telling people where they should be going and how they're gonna get there? Sure, so there's a couple of axes on which the questions become really interesting. One is, you know, there's these new technologies, things like Spark and the machine learning libraries that we can put on top of Spark and the ways that we can feed data to Spark in a streaming manner through something like Kafka to get a streaming capability in place. And all of those kinds of innovations open up newly addressable areas for data technology to serve the business. So often the focus of a data strategy for us is, okay, we're working with a customer, what does that mean? A good example of this was some work we did with Edmunds.com where the core question was, has NLP technology developed fast enough that we can take these unstructured views of what a car is that come in the form of a PDF that BMW sends you to describe what a four series is? And can we use NLP technology to recognize that BMW's X Drive is the same as Mercedes Formatic is the same as Audi's Quattro is the same as Ford's four-wheel drive, et cetera? So does NLP open up a new way of doing this that then brings some fundamental new value to the business and those questions are what we're answering in a typical data strategy? The other dimension is, to what purpose are we, or what is the scope of a data strategy and is it truly strategic? More and more, we're seeing businesses really interested in opening up strategic new top-line capabilities with data, yet a lot of what folks call data strategy is really a series of tactics around controlling access, cleaning, securing, you know, when you go to- It's more like protocols and procedures than not exactly. When you go to a typical data management conference and talk about data strategy, it really focuses in on those tactics and there's a new view emerging and we're both trying to catalyze and participate in that view that the focus should really be first on what are you trying to do and then those tactics become very, very important but the strategy is, you know, what investments do I make in capability now? And we have an enlightened self-interest, we're technologists and we will geek out on Spark and Hadoop and Kafka and the likes with the best of them, but we work for a living and we get hired by other companies to help them do something and it turns out unless you're doing something really valuable, nobody really cares about what you did with Hadoop, right? So the enlightened self-interest of these data strategies is we wanna position companies to do valuable things with these technologies and open up real business value as opposed to academic prototypes. But it's interesting when you talk about, you know, the conferences, it's almost like hijacked by guys who are, you know, trudging along, solving day-to-day problems and you wanna elevate the discussion, that's a different level in the organization. How do you reach the people who wanna engage in that discussion? So it's interesting, there's both a top-down way this works, you know, the business press is increasingly talking about the real value of these things, right? And most CEOs are at least aware of some of the stories of data around various industries. It's hard to miss Amazon, for instance, and what they've been able to do or Google for that matter, or Yahoo or some of those ilk. So we are speaking at least to a C-suite audience typically that is somewhat enlightened if not fully understand or made it from, hey, that looks interesting, to this is how that might work in my world or with my company. So there is that increasing C-level, business-level awareness of what data can do that's outside of the IT department, so to speak, creating some top-down pressure and momentum for these kinds of different views of data and how we're gonna apply them to an organization. And then there's sort of enlightened folks within the organization who become change agents. So sometimes it's a bottom-up somebody in the IT department who's thought, hey, there's a better way at doing this. My VP of engineering, when he was at Yahoo and started hearing about what one group was doing with the dupe was that he was working in some paid-search stuff and he's like, hey, I can use that to do some of the stuff I'm currently doing in a different architecture to do that better, right? So you get these people also from a bottom-up perspective who are seeing value opportunities and articulating it upwards. So in the best circumstance, you get a combination of those two dynamics and a lot of will to move and innovate. You're talking about your three problems in sort of architecture, data strategy architecture and build-out. How do those then kind of call us around the analytics platform then? So that's obviously there's relationship there. Absolutely, absolutely. And it's been tempting and business has for many years been able to sort of relegate the kind of the database choices and the underlying data infrastructure to operations and IT and it hasn't had a huge impact on business historically. And I think one of the recognitions of, sometimes I think of the, I guess the 10s, we're in the 10s, it's not the odds anymore, it's the teens. Here in the teens, it's revenge of the nerd's time. In that, actually that platform can make or break a lot of business value. If you're building a SaaS product and you platform it on something that has a poor cost to serve characteristic, if you have a database and we've done projects with companies who start out and they implemented something on Oracle and then they were massively successful. And it turns out that if they'd continue to grow that that they'd give all their money to Oracle and actually end up with an economic product. Whereas if you put that on a modern scale out platform you're able to preserve the cost to serve economics of it. So the platform isn't just a technology choice, it actually can massively impact the profitability of a SaaS product or a service offering. And the other thing is that within that platform you can make it easier or hard to innovate on data. And so when you think about, when you sort of strip away the stack diagrams and things like that and just think about what you're trying to do with data. You're trying to, you typically have some business problem in mind, you're trying to discover some data that is relevant to that business problem and then you want to acquire and ingest it into your architecture and you probably store it somewhere to integrate it with the rest of your world view to better answer that question and then ultimately take that analytical capability and serve it back out to the business in some way to impact a decision. Sort of it just described at a very high level what you can think of as a value chain of data and a well architected platform takes friction out of all of those stages and a poorly architected platform is fraught with friction at all of those stages. And the same thing can be said on the process side. So do your processes make it easy to traverse that or do they make it hard as well? And so the platform is really a very important component to how companies are able to innovate. And we've worked with lots of large companies where they've got a very talented data science team, for instance, that they are starting to put in place. But their ability to access data and do anything with it is sometimes non-existent, sometimes incredibly, it takes them literally three months of going door-to-door asking people to get more of it, et cetera. So it really can have a phenomenal impact on what folks are able to do. Let me take the one part of that answer and unpack it when you're talking about like the value chain of data. We know that, I love this quote that came from a VP of marketing at Lotus, referring to legacy applications, meaning those are the ones that work. So if you've got this new value chain of data and just analyze, operationalize, you don't want to throw the legacy stuff out, you want to augment it. Is there a set of, are there certain ways of thinking about how you do that without breaking things? Yeah, so you're changing the engine's plan while the plane's engine, while the plane is still in the air. Yeah, all four of them. Yeah, and there's both technical approaches that make that easier and then sort of a more strategic approach. So whether it's engineering sort of a green field architecture for someone, I should say architecting a green field approach for someone or working with somebody who's got a fairly developed internal infrastructure already. Abstracting at the right level in the right places to make it easier to change your mind later to isolate concerns of components and that kind of thing is really, really important to making an architecture extensible and really evolve with technology and capture the benefit of these new approaches. Within that, we typically use services approaches to implement that abstraction, which is to say that we have some legacy application. It's used to consuming some set of customer information. We build a service first that surfaces that customer information and then we point the legacy application at that. Now it's talking to a service and we can go change the thing that is implementing that without disturbing that application. Meaning change the service. Yeah, change the data infrastructure that's providing it so that the applications that are consuming that are none the wiser and that's not just for once, that's for all time. Now you've got a nice application that is intermediated by the service to the underlying infrastructure and as you need to innovate and evolve that, you can without disrupting these consuming applications. So microservices, architectures are the trend you're seeing of people reaching to this approach to accomplish that kind of thing and famously Amazon's been tremendously successful at when you think about what they have to do to make inventory across hundreds of thousands of retailers feel like a single pane of glass, the complexity that they are managing behind those services is amazing and they're able to do it because they use that service as a approach. And it sounds like that microservices approach would be appropriate not just for the legacy monolithic apps but for the Greenfield ones too. Absolutely. It's the separation of concerns. Yeah, I've been fond of telling clients that basically anybody that comes in and says they've got their future state figured out, I would throw out of my office because if there's one thing I've learned in let's say 20 odd years of doing these kinds of projects and building things with data, it's that whatever we think we understand about the world today, two years from now, there's gonna be a bunch of options we can't anticipate that will change the way we'll think about things. And so we should be architecting and building for a state of constant change and innovation, not some city on the hill that we exquisitely design and set out a multi-year roadmap to get to. Well, okay, that actually raises another pretty serious question which is you have the legacy app which itself is generally monolithic because modularity was very difficult back then. So how do you make that future proof or as future proof as your microservices Greenfield app might be? Well, there's a bunch of services that are ultimately backed by old mainframe processes still. In other words, abstracting away and using a legacy approach, a mainframe in some cases, a legacy specialized industry-specific application, I don't know, to run mining equipment or something. And those things indeed work and you don't replace them until you can improve them or at least lift and shift them to modern architectures. And typically we're leaving them in place and then forking them for other purposes, right? So on the inputs, maybe we'll do that same services trick and start feeding them both the inputs and feeding the outputs to services for the rest of the world, whereas their stove-type pipe stays unchanged at least as far as the consuming applications. Oh, so you put essentially new interfaces on either side. Exactly, so it's forking the pipe of data, right? Both on the inside fork off the inputs into whatever your modern architecture is to do whatever the N plus one thing you wanna do with it is. And then similarly, if that application is providing valuable outputs back to the business fork the responses too. So we'll do this like with inventory systems in a supply chain or something like that. You're still getting the updates in the same way. We're just listening in on that same pipe basically of where those updates are coming from so that we can handle them elsewhere. And on the same time as it sends out updates about inventory, we're listening to those as well. And so actually that's a great example of a Spark application. We actually built for a major retailer to manage real-time inventory over the holidays, right? So you're sort of taking the batch 15 minute interval legacy inventory application on the one hand and you're listening to those updates in real-time and keeping a real-time Delta over here in memory alongside of it. Oh, you have a shadow. So you're basically shadowing it. So now I'm still using my legacy inventory system as intended but I've got this real-time service sitting next to it that allows me to do fundamentally different things on the holidays and I think they credited that with something on the order of a 20 to 30% lift in online sales for them. All around Black Friday, if I'm not mistaken. I think I read the case study on the website. Old Dogs, New Tricks is what it sounds like. Good deal. John, thanks for being with us. We appreciate your sharing the information and your time. Hey, it's my pleasure. It's been a good time. Good to have you. The Cube continues from San Fran right after this.