 from Galvanize, San Francisco, extracting signal from the noise. It's theCUBE, covering the Apache Spark community event brought to you by IBM. Now your hosts, John Furrier and George Gilbert. Okay, welcome back everyone live here in San Francisco Silicon Angles Cube, our flagship program. We go out to the events and extract the signal from the noise. I'm John Furrier, the founder of Silicon Angles. My co-host, George Gilbert, Wikibon's big data analyst. Again, live in San Francisco for a special presentation for IBM Spark community event that's happening tonight. We're here on the ground. We'll be doing live interviews till nine o'clock tonight and covering Spark Summit all day today and tomorrow we'll be at the Hilton in San Francisco for the Spark Summit. We're unpacking the big trend around what's going on in big data. Spark is really changing the game little overnight. IBM's contribution has been significant. And here to break that down is our next, nice, Tamir, Mike Tamir, Chief Science Officer for Galvanize, which is where we are right now. This is an incubator or workspace education place. It's kind of like a melting pot of all that great innovation action of best of the startups with growth and education, of course, shared workspaces, which is the ethos of San Francisco. Welcome to theCUBE. Thank you. So great to see you guys successful. It's really exciting to see, I won't say incubated, but that was the early days of these workspaces, which was, you know, back in the days when blogging started, when I started doing blogging and podcasting was, you know, democratization, sharing economy before it was called sharing economy, sharing space, also because it was kind of cheaper to share space. That evolved into kind of this whole seed movement, this whole web 2.0 now cloud and then startups. But then what really happened was this real business needs, so there's education, there's code, the new technologies, and you guys have really, really grown. So talk about Galvanize and how's that fitting into this culture in San Francisco? So I like to describe Galvanize and each of these campuses as a sort of a teaching hospital for data scientists and web developers of the future, full stack web developers of the future. The tech industry is a place where you need a lot of that background understanding, you need to understand what it is you're working with, what kind of data science tools you're doing, what kind of machine learning you're doing, but you also need that experience on the ground and you see all sorts of articles out there about why people can't find the right talent in data science and why there's such a dearth of talent and such a vacuum. And the reason for this, we believe, is that there's not the right, no one's got the training recipe right for training data scientists, training full stack web developers. And we do that by putting the students in not just in the classroom, but in a classroom that's in an industry environment and that's what you see around you here. We have hundreds of member companies on over 70,000 square feet on this campus alone. We have six campuses going on seven very shortly across the nation and we are continuing to grow building out those industry environments. Alvanizing the industry, you guys got a great name. So we play the stats again. Seven campuses, 70,000 square feet here. 70,000 square feet here in Seattle and we just launched in Denver Platte also. You know, just an analogy, it sounds like community colleges were always very professional oriented, occupationally oriented where it's liberal arts where, you know, we'll teach you how to think and then when you get to your job you'll learn how to do the job. This sounds like a cross between them, like it's community college in terms of it's time and focus but with the heirs, you know, and the really advanced learning that you might get, you know, at a research university, it's time collapsed. Yeah, there's a lot of intersection there. So we actually have three different classroom programs right now. We have the full stack web development program. That was our original program through what was formerly G-School at Galvanize. We also, in this last fall, we launched our data science immersive program. This is a three month immersive program that takes the students from core skills knowing the background, math and coding ability to the two entry level data science jobs and sometimes even very advanced jobs. We actually just had two of our students and one of our instructors compete in Hackathon this weekend using the IBM Spark platform and just found out moments ago that they won the competition and one of those students, Jose, is actually was just placed at PayPal at a director level position. So we really focus on getting students into that. This is a great pay day. I mean, if you can come in, we were just talking about the data warehousing in the business intelligence market of my generation and it's like that a movie office space, give me the TPS reports. It's like boring world and it wasn't moving very fast. It's like go get stuff from the back room, bring it, do a report. Now it's in the front lines. It's at the center of the action, relevance and it's a money maker. So if you can get those skills, it's big. So given that, what kind of growth do these guys look for in terms of salary requirements? Give me an example of a guy who's come in like that example and what do they make in kind of a salary out there? So median salary for data science immersive students is in the low 100s, so 110, 120. I don't have the exact figures, but it's pretty good coming out of a three month program. We also have our third vertical is the GalvanizeU powered by UNH master's program. And this is a 12 month program that's designed to get students from square zero all the way to that expert level data science that data science unit. And this three months is immersive programs three months. The immersive program is three months. The master's program, which is fully accredited is 12 months. 12 months, okay, that's awesome. Well, that's, you know, more and more people are online. The MOOC and also coming here, getting immersed in our 12 months is fantastic. George, pretty amazing, don't you think? This is, I mean, how do you scale this? Not just you, but others in the industry. It seems to fill that gap where, you know, that there was that McKinsey, that article about, you know, we're gonna have to come up. That right, didn't they not? Yeah, 600,000 short, you know, but they don't say, you know, institutions like this can spring up and fill that gap. We can have better tools to make it more accessible. How is this scaling, you know, not just with you, but across the country? So yeah, that McKinsey report that you're referring to was sort of a large alarm bell that there was so much need. This is the HR version of a data science product waiting to be built, right? We're creating data scientists here. We're scaling out obviously very rapidly and we're doing that very prescriptively. We're thinking about what kind of data scientist is the industry looking for and then trying to give our students those abilities so they're able to. Are you working with the consumers of those data scientists? My consumers, I mean, those who want to employ them on the curriculum? Absolutely, yeah, we work with our hiring partners, with our member companies, so the companies that actually work here on galvanized campuses and also with our large enterprise partners like IBM. So I gotta ask about the McKinsey report because that's a great example of a miss. McKinsey's supposed to be this huge brain trust and I always said not this thing on McKinsey. They just didn't see it because they're reporting what they see. And to quote Steve Jobs, it's always that what people you don't see is what the real innovation comes from. I think this is a new generation of stuff that's coming out of the woodwork where it's just fresh perspectives, clean sheet of paper, the blind, young, unconsciously competent developer has no idea that they're slaying sacred cows all over the place with DevOps and analytics. So McKinsey points to, oh, that'll never work and then all of a sudden it happens because of technology. And that's really where the innovation comes in. So I want you to take us another example of where the average consensus is, oh, that's never gonna happen. What does Spark do? That will keep the skeptics at bay because everyone's always like, well, you can never hire a data scientist that there'll never be enough data science. Well, that's the definition today. That's broadened. Where do you see an example where people are missing the boat on their analysis of the market? So I think Spark's a good example. Four years ago, Hadoop was the next thing everybody, there was a lot of hype about it. Everybody thought that it was going to save all the data processing jobs with MapReduces, which had actually been around for quite a while. And nobody realized that in-memory computing was going to be the next big thing. Spark was going to help guide, not just that MapReduces distributing over several nodes, but being able to tactically and with these lazy calls figure out how to pipe data processes across multiple different nodes in a way that's going to make you be able to do things like machine learning very effectively and at large scale. So this is one of the innovations that has come around just in the last several years. And how do you see it shifting? Just order of magnitude, revolutionary, game changing. I mean, just kind of put some color around the shift. I think only time will tell. There's a lot of great innovations out there. Spark certainly has a lot of momentum behind it. And with the investment into the machine learning on top of Spark, on top of what's going on with MLlib, there's a lot of potential here to really have it take off. I wouldn't put all my eggs in any one basket because, of course... Well, that's the real thing about open source. You can try machine learning from IBM if it provides some goodness, you use it. If not, it's still open source. It's not like it's any hooks in there. That's right, that's right. Maybe, you know, tell us if this is sort of appropriate for you to try and make sense of this for us. But we see IBM coming on and contributing their machine learning platform. We see MLlib being the sort of official or standard repository for the algorithms. There's Spark streaming, but there's also Storm on the Hadoop side. There's SAMHSA, there's data torrent. Help us make sense of all these in terms of are we gonna see a sort of fragmentation or is there a unification that can come about? So I think the best way to think about it is the way you think about the tools on your tool bench. Each of these, you know, you might have several different kinds of hammers for different kinds of jobs. You might have several different kinds of drills for different kinds of jobs. They're not direct competitors. There's still very much a place in the world of large scale data engineering for Hadoop along with Spark. Hadoop is really more for successful architecture and warehousing right now rather than doing the distributed machine learning. There was a lot of hope that we could do that with the open source project, Mahu. So it's more the data warehouse complement right now. Yeah, they are now evolving as different tools for different jobs. And then on the Spark side, is it not just Spark streaming, but can you apply, you know, the SAMHSA streaming, the data torrent stuff? Is that all, can that all fit in the Spark ecosystem as part of its real-time underpinning to complement the machine learning and the graph processing? Yeah, so certainly it can as the, you know, Spark is very young. And so it's probably a little too early to see what's going to cotton on as industry solutions, but certainly there are very sophisticated architectures out there that might use all of these tools for their best purposes. What's the biggest thing in education you're seeing in terms of the learnings? What's the profile of people coming in? What are you seeing? Just demographics, if you can just share some insights into the range of people coming in for the immersion program and then the 12 month program. And then just in general, what's the appetite for the education on top of it? So the appetite is huge. Everybody wants to become a full-stack web developer with 110 salary starting. It's like a pretty bad-ass opportunity. Right, well, and yeah, even more so for the data scientists. It's one of the most exciting fields, and it's getting to do some of the most exciting stuff that you see out there right now when you use Netflix, when you use Amazon. You're actually seeing machine learning in action. When you use Google, you're doing information retrieval. You're seeing it every day. You're interacting with it every day. And a lot of people are interested in figuring out how that works and being able to improve on that. And that's exactly what the data science opportunity is all about. All right, so what do you think about IBM? Obviously, they're working with you guys here as part of a joint effort as part of their community event. You've got a great facility. You've got great access to the education. You're part of the million education number. They're a million developer march, I call it. How's that going? What's your take on their strategy? Justin, they're executing well. I mean, IBM's pretty heavily invested and they're not mailing it in. Yeah, this is very exciting. They're putting a lot of momentum behind a project that's already had a lot of momentum. And I think that's going to help us see a lot of maturity much quicker than we might have saw originally. I mean, IBM has the playbook. I mean, they did it with Linux. And they did it successfully. Now, I know that it's different now from then. Then there was other. They also had a mini computer as well and some different marketplace. And Linux and Unix was evolving. But still, it's a systems management world. And the fact that Berkeley's tied to its ironic but probably tied to that way, the history of systems programming really was Berkeley, right? One of the main places. Now, with Hadoop and now Big Data, DevOps, again Berkeley's in the center of the action for this kind of what I call new system operating system. I mean, IBM calls it an analytical operating system. What's your take on all that? How do you put that together in your mind? Look at this world of new. The newer modern infrastructure, the modern software, cloud. Yeah, so it's tough to come up with a good analogy, right? So the analogy of in the old school, you had a laptop, or maybe you had a desktop, and you had your operating system on top. Now, it's very different, right? And so we can kind of think about these analogies between the operating system and the data. You don't need a class of how to load Linux on a server and update patches. I mean, that class hasn't gotten no longer exist. That's right, that's right. And where things are moving so fast and they're moving faster now than they did in the past. Part of that is because we have a lot more connectivity. There's a lot more opportunity for open source projects to really take the best and crowdsource that information from across the world. And this is something that has really driven the move to open source development and analytics as a service as a business model now for new products. Great, when you bring up open source, I mean, the open source wave has been rising and cresting for many years. But we are at the point now where it seems no enterprise infrastructure software can be sold, closed source, and products that have 10, 15 year histories are going open source. What changed in the community and in the buyer's perceptions to drive that? That's a tough question to answer vertically. I would go back to what is just my opinion that one of the biggest parts about why these open source projects are so successful is that we've taken the best contributions from across the world. And more and more human production is not about one person going in a room, creating one thing all on their own. It's about collaboratively working on things and getting the best bits of ideas all across. And with the collaborative tools that we now have and that it's growing, we're getting better and better at that, particularly with all these open source commit technologies. Do you see when you plan out the way you're gonna teach your students over time, do you see raising them up to more accessible tools from deep systems programming? What might that look like over time? Yeah, so we don't teach very low object or into programming right now. We do a little bit of Java for the data engineering verticals that we're launching later this summer. But for the most part, data scientists can get away with just using Python and that means that Python is an object-oriented language but it's not terribly low level. And you can, that's an advantage. It means that you can learn a lot of the machine learning algorithms and all of these things. A lot more effectively, you can take advantage of already created pieces of technology that are out there. It's a shortcut, too. There's actually a building block that you can have in your tool chest, if you will, as a developer without having to go all in just on a vendor so it's open source. So you can play around with it, you can develop some innovation, but the end game starts to get on the integrated stack piece. That seems where the action is. So great contribution there. I mean, do you see it the same way? Or, I mean, ML is the machine learning as the building block to get started? Sure, yeah, yeah. So it's great to get your hands dirty early to see how these algorithms work and to be able to work with them quickly. That doesn't mean that we don't want our students to all be able to come out with the chops to coat up this stuff on their own and we make sure that they're able to do that. But that's only a piece of it and you can, when you circle around and start with getting familiar with these algorithms, getting used to it, you're able to cut down on that cycle time much more rapidly. But I also wanted to get to, even if you're in Python, which is, I guess, a fairly accessible language compared to systems programming, might we start seeing something even a little bit higher level like I might be dating myself, but visual basic where some amount of it is a graphical drag and drop and then there's coding to extend that. Like a framework, part visual, part coding. Yeah, so there are, it's a cottage industry out there right now trying to come up with the sort of Excel version of data science tools, right? So these sort of like plug and play, drag and drop type machine learning tools. I think that that's a ripe industry and that explains why there's so much competition in that industry right now. I've seen some that are very successful and have already gotten bought up and some that are not as successful, but it's very exciting to see where that goes, I think eventually. And either we might have heard of that are either speaking at the Spark Summit or are... So one that comes to mind that no longer is in place, KXEN. I think that they were KXEN. KXEN. They were bought up a couple of years ago and I think that they had a really quality product. Mike, thanks for spending the time here on the special event. You guys are hosting us at IBM. Appreciate it. This is theCUBE live in San Francisco for the IBM community event 2015. In conjunction with Spark Summit, we'll be here covering all the action all day tonight. Till nine o'clock, big event here. Again, developers, goodness, Spark is creating a huge wave of innovation. So stay tuned, silkenangle.tv. We'll be right back after this short break.