 Live from New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here's your host, Dave Vellante. Welcome back to New York City, everybody. This is Big Data NYC, our version of an event that we run in conjunction with Strata plus a Dupe World. This is theCUBE, the worldwide leader in live tech coverage. Joel Horowitz is here, he's with IBM, longtime CUBE guest, CUBE friend, and he's the director of corporate and business development at IBM. Jim Dieters is here, he's the founder and CEO of Galvanize and Travis Oliphant as the CEO and co-founder of Continuum Analytics. Both companies, partners of IBM, big announcement this evening. Gentlemen, welcome to theCUBE. Thanks for having us. Thanks for being here. So Joel, frame it for us. What's going on this week with IBM? How do the partnerships fit in? Yeah, well, we've been on a mission over the last couple of years to really accelerate what's happening in the data science community. It's really exciting times. It started with our investment in Apache Spark last June. We recognize that Spark was kind of at the forefront of this new era of data science and analytics. The actual intention of Spark was actually machine learning. I don't know if many people know that. Now it's tightly coupled with Hadoop. But Matei originally set out to build machine learning, open source framework. And so it's done very well. We're actually the number two contributor to Spark ML, which is exciting. But we've recognized that's one part of the story, but there's many other parts. How do you actually learn how to do data science, which is why we're partnering with Galvanize, as well as how do you actually extend that core Spark framework and go beyond with all the great packages that you see both in CRAN as well as in Python. So to tackle a lot of emerging use cases. And that's why we've partnered with Continuum Analytics and Travis here to actually explore that and start broadening the community. So, Jim, talk a little bit about Galvanize. We know you from some of the events that we've covered, but tell us more about the organization. Well, a lot of times you've seen us because we've been the host for a lot of these convening activities. But we run a 21st century school where we teach data science, data engineers, and software engineering. So think of it as a 21st century health club for nerds. And a lot of the activities happened on our campus in San Francisco. We run a very big campus and imagine this big, beautiful, dedicated building where you have dedicated data science faculty, hundreds of students, hundreds of different members of corporate innovation partners, whether that be Silicon Valley Bank or Booz Allen, and a lot of our technology partners. So think of this melting pot for learning, and we really, how Galvanize started was we were a consumer-based band that we taught immersive education to give people to be data scientists or engineers that have been hired by some of the best companies in the world. Certainly the tech elite, like the Facebook and the Airbnb's and the Teslas, but also the old guards that are becoming at data companies and software companies, the AmExes and financial services firms of the world as well. Now, fast forward, and in addition to having our open enrollment consumer products where students come to learn and get jobs, we now have been taking our curriculum and IP and centering it towards rescaling and modernizing corporations that are making these investments to build data products and to replatform themselves as PAS and data companies. Okay, great, and Travis, continue them, analytics to give us the bumper sticker on your organization. Bumper sticker, so we've been around for four years, but we've been doing things a lot longer than that. We're really about taking community that has built out the NumPy and the SciFi ecosystem and Pandas has been added to that over the many past many 20 years, and really delivering that to the enterprise. Python fits in your head and it helps connect the experts with what they're trying to accomplish without kind of messy computer science getting in the way. The same time, it delivers huge value because it can do messy computer science as well, so even the experts get excited about using Python to provide solutions. So we're really excited about the partnership with IBM because what we're trying to do is connect those advanced analytics applications that come from machine learning that's very popular with Scikit Learn and other packages around Python. A lot of people who make machine learning packages make it available in Python first. We want to connect that to the Spark ecosystem and the rest of the platform that IBM is, the data first platform, data works platform that IBM is producing. So many in our audience probably know the story. Early days of the automobile industry, there was a concern that there wouldn't be enough chauffeurs to drive all these people around. So, and then the analogy, you know, place that the data science were the big data where everybody's worried that you need a rock star data scientist in order to get value out of data. Now, maybe that was in part what the genesis of the organization was, but what have you found? Do I need this sort of unicorn data scientist or are you able to, you know, educate people so that I can be a data-driven organization without, you know, being Facebook? Yeah, I think, I guess there's kind of several macros that come into play is, I think one is the awareness we were talking about. Rob Thomas will be on the show later. His medium post yesterday about, you know, the end of tech companies. And that macro is, it doesn't matter what industry you're in, whether you're selling shoes or financial services, you're all data companies now. There is no such thing as tech, right? We're all, we all started in this world, tech was like this little thing over here. Now, everyone is a tech company and those skill sets and knowledge and understanding need to be pervasive. Clearly, a couple other things have happened is that technology continues to become more complex and more granular and the ability to use those tools, not just to be proficient in tools, but to drive the business insights and innovations that companies need to thrive is a necessity. And I truly believe that the organizations making the investments now will be the ones that actually survive and of course thrive in five or 10 years from now. And that's literally that transformation we're pushing through is happening now. So there is obviously a talent quotient that must be part of that. Chasing the unicorns is not necessarily the end all be all. And a lot of what we're doing is we're taking anyone with the aptitude, drive and determination to actually give them to the skills that make them successful. You don't need to go away for four years and most folks that come to Galvanize in our immersive programs are career switchers, whether they had some level of post-secondary education, whether they might have studied mathematics or biology, or whether they didn't do anything and they're just poker players. We literally have frozen yogurt clerks to apply mathematic majors from Cornell that come to our programs. What we're giving them is the skill sets and the understanding of these technologies that actually make them useful for an organization. So we're kind of the opposite of the unicorn. We can, we build people that can become proficient and successful in an organization very quickly. They've all got a data gene, right? Yeah, that's really part of what we've been doing to continue them though is creating something like, so we've created Anaconda. Anaconda is an open data science platform that actually makes unicorns out of everybody, right? And so one of the things we've often said is people are looking for unicorns everywhere. We created the PiData ecosystem, which is a unicorn feeding opportunity. So PiData feed and everyone comes and most of them are using Anaconda. And Anaconda gives power to the every person. Citizen data scientists can now do the kinds of things for an organization that you typically had to go to school forever. And now a program like Alvanize can absolutely join with Anaconda, produce effective solutions for a company quickly. Exactly. Yeah, I've kind of laid it up as, and I think the reason why we have these two amazing partners with us today is, there's really two parts of the story, right? And when you bring up the question of are there enough unicorns or data scientists, I would say, we look at it as from the standpoint of there are data producers and there's data consumers. And we're calling our event the data first event because I think a lot of times people go in and they start building an application and then they think of data science after the fact. So I think you see two things. I think you see us forming a strong alliance and partnership with Continuum because we recognize that Python could actually address both communities. They can have folks who go deep on data and produce data products that could then turn around and use that same language, Python, and turn around and build very robust applications. Python's used at a number of the Fortune 500 companies. For that very reason, it's the most popular programming language in existence today. It's growing faster than anything else. And I would say that's also the reason why we're partnered with Galvanize. Python's their language of choice. So it really comes down to two things. It comes down to a highly robust platform that we're launching today, as well as a highly sophisticated and frankly approachable method that we're working with Galvanize on. So I think those are kind of the two parts that will solve that unicorn problem. So a lot of interesting discussion points here, particularly excited to talk to Rob a little bit about his premise. But you know, it's like Mark Banyov says, there'll be more SaaS companies coming out of non-tech companies than tech companies. And at Wikibon, we've talked a lot about how data practitioners, the buyers of technology are going to create much more value than the vendors. And Rob's just scanned it, but talking about some of the challenges that open source vendors are having. John Furrier would say, why buy the milk? Or why buy the cow if the milk's free? And so it was struggling there. Having said that, when you look at some of the activities that are going on in terms of value creation, it sort of starts with, we always talk about data-driven organization. It starts with understanding how you make money. And I wonder if you can comment on this. A lot of people thought, okay, I want to be data-driven. I have to figure out how to make money by selling data. And that was the first mistake. What really happened is they said, okay, how do I make money and how can data support that? What data sources are available? How can I make that data of sufficient quality? Or how can I trust that data? And so what role, first of all, is that a reasonable framework? And what role does data science play in all of that? Well, where to begin? I mean, honestly, you're talking to the folks that clearly have drunk the Kool-Aid and also work with a lot of companies in terms of what does a data strategy and a series of data products mean for their organization? And I think, like you said, there's a maturity curve in terms of people understanding and adopting tools from building your early data repositories and legs and getting the data somewhere. But in a lot of, for many years, we've been collecting data. I think what people haven't realized is just the latent value of the insights inside that data. Not just to sell it. And particularly usually it's not to sell it. It's to maybe better sell what you already have or repackage something into a new way or target a different customer segment that you didn't know. And those are kind of working your way through a maturity level of how you understand and use your data, how you drive value from your data and then the different products you can build on top of your data. And I think sort of that last maturity is, I think as IBM calls it, sort of transforming into a cognitive business where you're adapting to marketed conditions and building new offerings in a way that's almost seamless or flawless, right? And I come from an agile software development world where we think about it the same way. And you're basically, as you look at what's happening, you have sort of the idea of building agile offerings, which in my opinion are just new products, the seamless integration of data and the cloud, these things are all becoming one, almost you almost can't tell them apart anymore. And ideally that's where a company wants to mature to to address different market standards. And you pick an industry, and this is what Rob will talk about. There isn't a single industry that isn't going to have to go through a massive transformation of how people consume products, like the largest industrial company in the world trying to become a software company, right? And then there's the GGE in the New York Times, 124 year old startup, where Jeff Emerald said, you know, there is no plan B. We will become a software company where the data that machines are producing are likely more valuable than the actual machines themselves, but those will open up new markets and new opportunities, not just software. There's not an option there, right? I think there's a little misunderstanding about open source. Many people are making the mistake of assuming that just because something's open source, then I should just go straight and build everything internally on top of that. And I see this over and over again, having worked with open source for 20 years and having gone to companies and seen projects and developed projects in those companies. The mistake is made is, as I go to each of these different companies, they all build exactly the same layer on top of the open source, instead of monetizing the cost of that thin layer now, it's a thinner layer than the deep stack layer that they're used to paying full stack price for. But there always is a need for an enterprise layer that connects the open source with their particular value model. GE wants to become a software company. They're not going to become a software company by rewriting a data-first platform on top of that. That's not how they do it. They need to buy, they need to build the software that's specific to their business, benefits their specialization. So a lot of companies are making that mistake. So there absolutely is a place, but it needs to be popularized. People need to understand it, that there's a place to buy a layer just because it's not the full stack like the previous proprietary vendors have provided. There is an open source base, an innovation center that keeps rising, and that's awesome and that's amazing. But there's a layer that you need to purchase to save yourself money. So you're not paying the maintenance costs of that layer, you can amortize across many other providers. Well, IBM's a poster child for this. Yes, they are. Steve Mills said we're gonna, IBM said we're gonna invest a billion dollars in Linux. We're seeing that play again, play out in Spark and other places, but you've perfected that model of making money and adding value on top of open source. Yeah, no, totally. I mean, we definitely call Spark our analytic operating system. We call the data science experience that we launched last month or two months ago or IDE for working with data, right? And now we're basically going to announce later this evening what comes next and you could probably guess where we're headed based on a lot of the conversation here, but it's really about, okay, now that you've brought together a way to produce data products, how do you consume that? And I think IBM is the only company, in fact, I think it is the only company that has numerous ways for people to consume data products. If you look at WAS analytics, if you look at Bluemix, if you look at GBS and all of our industry solutions, there's no better way to bridge that gap. So this comes back to your earlier question, which is, are there enough data scientists or data engineers? No, just like you didn't need a chauffeur for a car, but if you can take what they're building and repurpose that across your business, then that will give you a lot more bang for your buck, right? I mean, there's a lot of value trapped in there that if it only had an outlet and a way to escape, right, this core group of people, I think you would change the world. And I think another thing that Jim said about how we're headed towards cognitive solutions, I read a really good stat the other day where people make 35,000 decisions a day compared to, I have a two and a half year old, that make only 3,000 decisions a day. So you think about that and it's like, man, that's a pretty big gap. Like we're making lots of decisions and especially if you're running a business and if just a fraction of those could be either augmented or even automated so that you can focus on the stuff that matters. I mean, we're like attached to our cell phones now or I guess we call them iPhones or whatever. You know, we're attached to these things, right? I just dated myself. You attach these things, but realistically it's like you should actually be able to live in a world where a lot of that stuff gets like pushed out and you can just focus. And I think that's what this is about. Well, you're talking about operationalizing a lot of this stuff and making it invisible. Correct. Yeah, that's kind of the holy grail. And then business people can actually be at least quasi data scientists because the system's supporting them. And that's the sort of vision that you guys are putting forth, right? Absolutely. And I really appreciate IBM promoting the Python concept. It's a secret sauce. It's actually been used throughout every industry. And a lot of, for a long time in the financial sector nobody would talk about it. They all used it, nobody would tell each other. And it was really funny as NumPy got adopted how they didn't want anybody to know they were using it. Now they're okay, they recognize everybody else uses it. It's a secret sauce because of its ability to agile and very quickly iterate. That's the key to getting to this world we're looking at is how do you iterate from here's a data problem, here's an idea, I got to put it in production but then I got to update it quickly. I got to then figure out how to iterate from the production to the design back to production. How do I make that cycle as seamless and as quick as possible? That's been the center of what we've been doing with Anaconda and it's also it looks at the center of what IBM's trying to accomplish. So we're really excited about the partnership. Your community's not afraid to break things, are they? Correct and so breaking it in a governed way, right? That's the trick. Oh, you know what I mean by that. Yes, I do. Purposely breaking. Purposely breaking. Our team we've always, you know, we believe deeply in Python, all of our courses are centered, all of our data science and data engineering courses center in Python. It is sort of the language of big data and we even help people along the way. We offer a lot of part-time workshops even to start teaching the syntax of Python so you can start to become literate and how to manage and manipulate data and that progression with both Continuum and IBM has been, you know, a really good fit. Yeah, we're super excited. So we're gonna announce actually later today how we'll be joining NumFocus, IBM will, which, you know, is really exciting because then you think about, you know, once IBM, you know, we tend to, as you saw what happened with Spark, right? So you can imagine. We were not on the map with Spark before last year. Once we get involved, we tend to bring a lot of our resources from our, you know, work with clients and customers to have a really positive impact on these communities. So we're gonna be announcing that later today. So it's super fun. We're gonna be announcing some more really cool stuff with Galvanize as well. All right, good. Well, we got to leave it there. Big shindig tonight, announcement. Yeah. Party, we'll be covering it. Awesome. Thanks very much for coming. Thanks for having us. Great to see you guys. Thanks, thanks for having us. You're welcome. Keep it right there, buddy. We'll be back with our next guest. This is theCUBE. We're live from New York City. We'll be right back.