 Okay, we're back live, packed coverage day here at the SiliconANGLE's exclusive coverage of Stratoconference. Go to siliconangle.com for all the reference point of innovation around big data, cloud mobile, social, analytics, we got it covered. My co-host is Dave Vellante, the chief scientist at wikibon.org, and go to wikibon.org for all the free research. Go to wikibon.org slash big data. We have the most comprehensive set of data research on the big data segments than any other competitor or any other company, so go there. Our next guest is Peter Wang, co-founder and president of Continuum Analytics. Welcome to theCUBE. Thank you. So, developers are a big part of this market. I'll see it's open source base, but when you start to get into the big money players like EMC, we're all now essentially data warehousing type products. You got a data business, data processing, which is not necessarily looked at in a DevOps or developer kind of way, and we've been introducing this concept of computing, software defined networking, software defined infrastructure is all colliding. So, what we talked to you about, your take of the big data industry as it evolves from the bottom up with developers and top down with business kind of impact and kind of where the role of the developer, where's the role of analytics, stack wars are going on, platform wars are happening, so just what's your take of the current ecosystem here at Strata and the big data world? So, my take is actually that there's been fundamental disruption in the storage and ETL of the big data or the business analytics space, and I think people are pushing up market and there's down pressure from up market I guess on trying to reinforce and show up those infrastructure pieces. But I think what's interesting is that both of these competing forces I guess are all, they're competing in a climate that's changing and that the big data wave that's coming is actually exceeding I think the disciplines for doing business analytics that most companies are used to and that what we'll see over time is that business computing and business analytics is going to trend more and more closely towards traditional scientific computing and scientific simulation sort of needs. And so the intersection of those particular ecosystems is going to be very interesting to watch. Yeah, we've heard some of that theme this week. We had a company on a different space, they do storage infrastructure, but they're in the HBC world and their premises, those two worlds are colliding. The HBC and the big data worlds are coming together. What are the similarities specifically? Well, I guess the thing is that scientists for a long time have had data sets so large and budgets so tight that they couldn't afford to not squeeze every ounce of performance out of their infrastructure. And so as a result, they've developed software stacks and technologies and computing paradigms that are, they seem esoteric I think to the business world but that are extremely, extremely performant. But, go ahead, sorry. Well, and so what happens is that I think now as businesses are faced with sort of very, very critical analytical challenges in the large volumes of data that are swarming them and swapping them, they're going to have to have more of that kind of scientific computing discipline. Yeah, so there are differences as well though. I mean, I think a lot of HBC data is more structured as opposed to loosely structured. And that's actually changing a little bit as well. One thing you'll see a lot in scientific computing conferences is people talking about data management, data provenance, all these things that are very, concerns that are very, that you see in a traditional data warehouse, right? Master data management, data governance quality, things like that. Because they do so many simulations, so many data acquisition runs, just managing that stuff becomes very difficult. Now how do you guys exploit that and take advantage of that? Well, so what we're doing is actually we're taking open source technology, Python, and the ecosystem of fantastic tools that have in the last 10, 12 years really been kind of forged in fire in the HBC and scientific computing space. And we're taking those tools and really looking at how to adapt them better for the business computing needs. Okay, so where are you as the company now? I think you can give us some background there. So we're actually just a year old. We started just about 14 months ago, but we're all veterans of the industry and we actually, a lot of us have written open source tools in the Python space that are very popularly used on Wall Street, oil and gas, weather simulation, you name it. And so we're looking at building the next generation of tools that will sort of serve as a foundational layer for Python in both business computing as well as scientific computing. And that's Anaconda and that's your platform. Yes. Peter, one of the things I want to chat with you about is first of all, we totally love what you're doing. So just disclosure, we're biased. We think the tools of the trade and data science are evolving obviously from DBA kind of analyst role to outside developers and that's colliding. And Python developers all know that, but so we agree that instrumentation is happening because now data is 100% available if you know what you're looking for. So you talk about Wall Street, the scientific community simulation, those game theory meets behavioral economy kind of things we were talking about yesterday. So what is the state of developers right now and data? And for example, we were just talking in the last segment with Sean Connelly from Hortonworks around the future business value is going to evolve from not structured purpose-built data warehouses, but data marts that evolve and are adaptive and are learning data as code as we introduce it this morning. So all the elements of data processing that IT is used to goes out the window. So I want you to comment on that. Data processing kind of grew up from a mainframe mindset and you haven't heard that term around, but now we're hearing data processing, data pipelines in mainstream IT. That's not in IT's DNA, traditionally. So how does that evolve into business value and data sets that's taking place? Well, I think that's a really good point that essentially data is at this point a first class concern, right? And data is mass now. Like when you have enough data you can't just willy nilly move it around. You have to really think about where did it come from? How am I going to view it? How do I want to transform it into those most useful views? And do it in a way that doesn't incur more data movement. And I think so data movement as a first class concern is something that the HPC world is very aware of, right? They actually look at computation in terms of flops per watt. Well movement, what about transformations? Well transformations, traditionally I think in a traditional data warehouse view, transformations almost implied copies of movement, right? Okay, got it. Versus just actually moving the computation to the data, changing the description of the data around talking about how you might want to change the representation and moving more and more of the analytical code close to where the data is. So treating data locality as a first class concern, treating the size and the volume of the data as really having a respect for that and not saying I'm just going to build another cluster here where I can move the data over and analyze it here. That's the thing about why can't I analyze it in place where it's stored? I think you see that a lot now with all the vendors here at Stata. So do you agree with EMC's approach by bolting on pivotal HD on top of HDFS? I think that sort of thing is moving in the right direction. Moving more and more analysis engines closer to the data is absolutely the right way to do it. You're also, I know it's bungling in there. Wise I.O., we saw them last night, John, and the startup comes in. So you're using sort of new techniques, in memory techniques, smarter use of L1, L2 cache, much more efficient from a performance standpoint. So architecturally, can you talk about that trend and what it means to customers? I think the memory trend actually that we've seen in the last 12 months or so is maybe more than that. But the memory is sort of thing is what everyone's talking about now, right? And we're actually focused on more than just that. We're working on basically providing a more uniform approach to in memory, out of core, or what I call heterogeneous computing. So whether you have GPUs, whether you have Fusion I.O. boards, whether you have a Xeon Phi Intel mic architecture, whether you have a cluster of machines hooked up to a sand, you have basically a very heterogeneous storage and compute infrastructure and needing to address the data and the compute or what I call the meta compute and the metadata in a uniform way is a first class problem that really not very many people are tackling. They tend to give you a whole built, purpose built solution. Put your thing in our silo, put your thing on our platform versus saying whatever you might have, we have an engine, we have a way of describing that data, describing the compute infrastructure and you can write your analytics in a high level way that's independent of that. Yeah, so you're talking about, you mentioned metadata, a lot of times locked into that silo. So in this model that you're prescribing, I presume that metadata is accessible. Absolutely, our Blaze project, which is our next generation NumPy approach, that's actually essentially a open source tabular data descriptor framework and so you can actually describe data, whether it's in files, whether it's in a database, over ODBC, wherever, and you can write Fortran or MATLAB like array code that does a very efficient linear algebra, very efficient predictive modeling over that sort of thing. And so the Y's IO guys are great and they're doing next generation sort of things and they're doing it in Python and that's, you know, there's a reason for that. There's a lot of engineering going on. So we want to, we're talking about this concept called data as code is kind of just a way to put our arms around kind of what infrastructure as code is for DevOps and cloud, but the big data world, you know, code as part of first class citizen, if you will. But there's a lot of issues going on around unstructured data, for example, like life cycle governance of a corporation. Like, do I store this legal issue? So obviously everything is storeable. So there's a variety of issues to be solved that are unsolved at this point. So as a developer who deals with data, what do you think the future of that direction is going to look like? I mean, you have any insight and you want to share with the audience of how you see data sets, are they programmable, are they self-learning, are they self-healing, this kind of concept because instrumentation is a valuable and predictive analytics is becoming table stakes for business. Right, right. I think that's absolutely right, that essentially the days when you can view data as a static thing that you just handle and manipulate is very much over, I think, at this point. And the transformation of that data, how that data got there, I saw a great quote the other day that there's no such thing as raw data. Right, at the end of the day, there's some firmware and a sensor that actually gave you that data. And as you transform that data over time, if you only view it as transformation steps on static data, you're going to constantly be plagued with this problem of describing that data and so on and so forth. If you actually view the data and the code together holistically as a unit and you're able to actually view subsequent transformations, subsequent derived data products as code transformations on that data, that's a much more holistic and much more reasonable way, I think, of tackling that kind of problem. Yeah, I mean, and so how does this change the computer science business? I mean, just as a side note, because, You had a small question. Yeah, I mean, because you start, there are some great old paradigms and concepts that are integrating into, but it's not exactly the same. AI, for example, machine learning, learning machines, learning, so all this is now kind of coming together. Viewing comments on that is what you're seeing in terms of what innovative cool things are coming out of the big institutions and the players and how computer science is changing because it's not just write code, build monolithic systems, it's a much more holistic approach. Well, I think that more and more we're going to see even a greater bifurcation than there already is between practical industry oriented computer science versus theoretical computer sciences mathematics, right? So there's that. But even the way that the industry practice the software development, I think is having to change over time and I think that what we're seeing essentially is that to Peter Norvig's quote about the unreasonable effectiveness of data, right? I think that's absolutely right, that it's data is disrupting how we view science, how we view mathematics, how scientists approach their problems. And the idea that actually data can tell us a lot more than we used to think about interrogating it and now we're actually trying to learn from it. And I think that's a very positive step. So interrogation old way, new way is learning from the data. Yes. So what kind of impact do you think the mobile apps, because obviously the big joke that we always say is that machine, the low hanging fruit is recommendation engines and kind of cool things and some predictive analytics. As you climb the tree and put your ladder up and get the top fruit on the tree, what are the big applied, big data applications that you're seeing? So predictive analytics is sort of a very big term, right? And that's basically sort of like it's from here to outer space, it could be anything, right? So I think that predictive analytics is absolutely the order of the day and probably in five years' time we're going to be talking about things that are non-predictive analytics as essentially just simple reporting and drill down kind of things. But overall the mobile revolution is centralizing all of our activities, all the data around our activities, all the data exhaust is centralizing. And so that's the big effect of I think cloud and mobile computing. And so it just creates more opportunities for those predictors. Well Peter we got a break but this is like we could talk for hours, we love data gravity, data physics, data as code. We are living in a new world and that new world is measurable, the simulations is learning and it's totally disruptive. Any kind of pre-existing process, keys and interactions. So thanks for coming on theCUBE and we'll keep in touch, a great guest here and go check out Continuum Analytics. Thanks for coming on theCUBE, we'll be right back with our next guest after this short break. All the programs out there and I identified a gap in tech news coverage. There are plenty of tech shows that provide new gadgets and talk about the latest in gaming but those shows are just the tip of the iceberg and we're here for the deep dive. There's a difference between technology consumers and those who live the business day today and our viewers recognize that. The market begged for our program to fill that void. We're not just touting off headlines, our goal is to provide you with a story but we also wanna analyze the big picture and ask the questions that no one else is asking. Our guests aren't just here to provide commentary, we work with analysts who know the industry from the inside out. The tech business isn't new but many networks treated as if it is and really barely scratched the surface on technology coverage. We follow the expansion of the cloud and the evolution of big data. We're covering new enterprise from startup to IPO and every move in between. So what do you think was the source of this misinformation and so you mentioned briefly there are several other. If that's the case then why does the world play? I like to think of us as a companion to theCUBE. We're here every morning trying to extract the signal from the noise where theCUBE excels in event coverage. We're working to bring that experience to you consistently every morning. We use the top stories of the day to provide you with breaking analysis so that you can forecast future trends. We're here before you even wake up. We're creating a fundamental change in news coverage. Laying the foundation and setting the standard and this is just the beginning.