 From New York, it's theCUBE. Covering Big Data New York City 2016. Brought to you by headline sponsors, Cisco, IBM, NVIDIA, and our ecosystem sponsors. Now, here are your hosts, Dave Vellante and Jeff Frick. Welcome back to New York City, everybody. This is theCUBE, the worldwide leader in live tech coverage. Armando Ruiz is here as the lead product manager for IBM's data science experience. Welcome to theCUBE, thanks for coming on. Thank you for inviting me. You're welcome. So what is the data science experience? You guys announced a little while ago, but what's it all about? So we, yes, we announced data science experience in June in a big event. And basically, today we are announcing the platform. Data science experience is like the application targeting data scientists. So we are trying to make the best platform anywhere in the market to targeting data science persona. So why did you do it? Where did it come from? Give us the background then. So, myself, I'm a data scientist. I spend quite a lot of time working with different companies. And one year ago, at IBM, we were interviewing many data scientists all around the country, trying to understand how they work, how they collaborate, how they share assets. And we saw that there was a big gap in tools and how they are all connected together. And that's when we started to think, okay, let's create something new from scratch, clean with a nice UI, all based on open source. And in June, we announced the data science experience. In the first day of the announcement, we got 1,000 people already interested in the platform because it was in close beta. And now we have over 10,000 users. So you were sort of the initial guinea pig, is that right? Yeah, well, we were some key people at IBM pushing for that and I'm glad finally it happened. So what were you demanding at the time? So simplicity, power, give us some insight. Yeah, it's all after checking how data scientists are working today, they don't work in silos. You never start from scratch. You always go to the internet, Google, GitHub, Stack Overflow, like these data science communities in there. And you try to learn what is up there and trying to import that and start working from that. So data science experience has a big component of community. So instead of making you go to all those communities on the internet, we bring the community into DSX. So we're trying to make it like kind of a Facebook. Facebook, you go to see what's going on on your friend's life. If you want to know what's going on on DS science, we have like a feed with new stuff every day in there. And basically in one click you can start, it's not only about reading, it's about in one click you can import it and start working on that and collaborating with others. So it's community and it's also content. It's community content and then tools, of course, because data scientists, they have to build stuff. And then we are spending so much time on collaboration. So how to make the collaboration between these different data scientists and the different personas, like a very great experience on that. Three Cs, community content collaboration. Okay, so what's been the result of the launch of the experience? So yes, day of announcement, 1000 people today we are still in close beta and we're announcing today that we are opening up. So everyone can go and create an account in data science.ivm.com. And we've been onboarding users all summer up to 10,000. So basically the feedback has been amazing. You know, when you start a new product, that's when we do product managers, we create new stuff. And sometimes we have a feeling it's right. Sometimes it's like maybe it's gonna work, but we had the feeling we were doing the right thing but we got it confirmed by the community. Every client, every data scientist we are showing data science experience. The first thing they are asking is when can I get started? So every time data scientist can go to data science.ivm.com they can create an account and they can get started with Spark right away. Really? What's the piece of those components that's the most compelling form between the tools, the content, the community? Yeah, so open source is there, right? So you can always try to set up the full environment by yourself. What is very exciting is it's only one click away. So you don't have to, like, when we go to a room with many data scientists and we ask them, do you want to learn Spark? They all raise their hands, they all want to learn Spark. If you go to the Apache Spark website you have to start to deal, okay, how do I get started with Spark? I need to install the Docker container, I have to set up the whole thing in the cloud. Here you create an account and we spin up a Spark cluster for every data scientist. So they love that, it's so easy. Then the collaboration features and everything connected in one single environment is something that's very compelling. And do they pay? Is it a subscription? Is it a kind of, you know, like Facebook where it's kind of free until you, premium your way into some premium features? Yeah, we'll have different plans. We want absolutely everyone in data science experience so we'll have a very compelling free plan. So, and it's power on demand. So the more you need, you will have more ready for every user. So we have a free plan, we'll have a pay go plan, and then we'll have an enterprise scale plans as well. Talk about your experience as a data scientist. How did you get into it? What's your background? I actually didn't study data science. I did electrical engineering. But then my first graduate job was in a telco company and every morning it was as a data analyst. So it was going every morning, they were giving me data for the performance of the antennas in the city of Barcelona. And then I had to check that data and try to change the orientation on the TILS on the antennas and try to provide a better service for the users. So that's how I got a bit addicted to data because it was like, okay, I get data, I change the orientation of the antenna and then I can provide better phone service to 1,000 people. So it was like so powerful. And then when I came into IBM, I was like all over data science. You know, just to say it begs the question because a lot of conferences actually, Dave and I have done, talk about the citizen data scientists, right? And trying to bring the tools, expertise, insight to people that aren't necessarily formally trained as PhD data scientists. You sound like you've kind of evolved from one into the other. So as you see something like this, do you see more people on the fringe of data science being pulled into that capacity capability to have the tools to start to do some of these things? Data science is becoming very trendy but what is really data science, right? It's a concept that is evolving. We had analysts before, we had like statisticians, we had people that knew programming. Now everything is kind of in data scientists. We had Gartner saying that data science is like, the scientist is a person and has so many skills, they are unicorns. Everybody's talking about them, they don't really exist, right? What is really happening is that you have teams of data scientists and they have different skills and they have all to collaborate together. Today it's super simple to get started with data science. There are so many courses online, so many resources that you can find. We are trying to bring them all into one platform to help everybody to get started. So you don't need to be a PhD to do data science. Okay, but so you're the unicorn data scientist, that's funny and I've heard that too. But you like math, right? You good at math? I like math. Computers? Yeah, I'm better in computer than in math. You're better in CS than math, okay. That's okay. Statistics, you could do. Yeah, I like it so much. Maybe not your favorite, but if you had to do it, you could do it. You like to hack data? Yeah, that's my favorite. I think it's the favorite of all data scientists. Isn't that really the primary, the Hillary Mason prerequisite of being a data scientist? So I mean, the whole unicorn concept, I mean those are not, you're right, some you like better than others, but generally speaking, if you're good at math and you're okay with computers and you configure your way through statistics and you like to play with data, that's the most important thing, the passion of that. So there are a lot of people, now maybe they're not the rock star data scientist at the top of the pyramid, but. There are a lot of people. And then IBM's in the process of training more and more people, so the IBM data science experience that you create basically is a way to get more people involved and provide them services and you would think it's going to be a big field, you know? Yeah, and it's like also open source is becoming so strong in data science. Like in other fields was very strong in data science, it became very strong maybe in the last five years. And to be a data scientist today is so hard because you have new stuff really every week. Every week you have a new company, Facebook, Airbnb, where they are open sourcing everything. And it's like on the hands of every data scientist and it's so hard to be up to date and I want to learn all this stuff and how do I do it, right? You need to set up a new environment and everything. So that's what we're trying to do in data science experience, we're putting everything in the front of all the data scientists. When you say they're open sourcing everything, you mean they're launching new projects? So a good example is IBM. We in the last where we're contributing super, like very committed on Spark, we're the number one on machine learning in Spark. We open sourced system ML for machine learning. We're open sourcing some other very interesting libraries for data visualization like Brunel and we have like PixieDash, which is improving the interface on Spark for Python. So these are only the IBM open source data science kind of libraries that we are announcing, but we have Microsoft, we have everyone doing the same every week. And when as a data scientist you read a tweet and you want to go and try it out, that's what we do. So you're a committer to many of these projects, right? Yeah, I've been a committer on R projects. Another good example is R. R has been there for 40 years. And you know R you have like the core and then you have the contributions from the community. Six years ago they were like 2000 packages, R packages in CRAM. In the last six years we went from 2000 to 8000. So I've been a contributor on that. I've been contributing packages in R, but it's like in the last six years we've done more than the last 40 years. So that's the community, just doing stuff for the community. What's the, as a data scientist, what's the biggest challenge? What one or two things would make your life better? I mean the data science experience was one step. What's next, what's on the to-do list? So big to-do list is adding automation. Very good examples in machine learning. You start a project, you have maybe like 50 different algorithms and you can select so many technologies. You can select Spark, you can do Python, you can do like everything. So how do I automate and I get recommendations on what to use at every step in the process of data science, right? So automation is something that we're gonna be bringing in the science experience in the coming weeks and it's very exciting. How do you decide today? You hit the community, you talk to your colleagues and trial and error. So how do we decide what to add in the science experience? No, sorry, as a data scientist you have choices to which tools to use, which technologies. It's based on expertise and you never know enough and then you make mistakes and then you try to fix based on that. So the idea would be you'd automate the recommendation of which technology is the best fit. Jay is like applying machine learning to do machine learning. Machine learning. I read an article that data science as a profession is gonna disappear in the next 50 years because the technology is gonna be so good that we're gonna automate everything. I don't believe that's gonna happen but we'll have automation like in every step. Very good example, something we are announcing today as well is we have something called model visualization. So you create a machine learning model, you want to put your business in the hands of this machine learning model but it very works well, right? So when you create a model you need to check the performance of that model. So basically what the data scientist is doing today is just writing code to see visualizations. Now in this science experience, in one click you have like a full dice board on the performance of that model. So that's gonna help make the decision, okay, this is a good machine learning model or I better keep trying something else, right? But it's an interesting challenge because you can try so many things, right? It's so far beyond an A-B test that you can run for a period of time and do an analysis. Now you can run tens, hundreds, thousands of algorithms. So where's kind of the trade off and oh by the way now there's 10 new tools that I didn't even know I could apply to this problem before. How do you organize it, keep it straight? We have the machine learning process and it's not like start to end, it's like an interactive process, it's never ending and you're always learning and coming back to a previous step, changing things and maybe improving the accuracy and then you have new coming data and that's what we're, data science, like we will never lose our jobs because the process is never ending and so you never finish your work, right? What's the, what's the gap in terms of? Skills is a big gap, what do you say? Skills, yeah. Okay, so it's not, I mean, plenty of data sources. It's the skills to improve the quality of the data, is that right? Yeah. And the algorithms that? Yeah, and for example, data scientists today, they spend like 80% of the time cleaning data and that's not the fun part. That's not like, I didn't find a data scientist that's telling me I love cleaning data, it's so boring. So we are, we're improving that, we're trying to bring a lot of automation on that side as well, on the cleaning data. So instead of like 80%, maybe it's like 40%. And then the gap is the skills. You know, we trained like, we wanted to train one million data professionals. We have big data university with like 500,000 data professionals learning Spark data science. So it skills is a big gap. What would you tell the young people in the audience that want to become a data scientist at a university now and what do you do, what do you? We got engaged with many universities. I always tell them, go to data science at IBM.com. It's so easy to get started. They have tutorials in one click, they can start working on them. There are so many resources. Data science is so exciting. So I encourage everyone to start. Great. Arman, thanks very much for coming on theCUBE. Thank you for inviting. Appreciate your time. Thank you. You're welcome. Keep right there, everybody. We'll be back with our next guest. This is theCUBE, we're live. We've got a special presentation coming up. We've got a little town hall and a panel. We're going to talk more about data scientists with some of the expert practitioners in the industry. So keep right there. We'll see you in a moment.