 from San Jose in the heart of Silicon Valley. It's theCUBE, covering Big Data SV 2016. Now your host, John Furrier and Peter Burris. Okay, welcome back. We are here live for exclusive coverage of Strata Hadoop here at Big Data SV, our event right across the street from Strata Hadoop. I'm John Furrier. This is theCUBE, our flagship program. We go out to the events and extract the signal from the noise. My co-host Jeff Frick for the segment and our guest, Chris Devaney, VP of operation of the company called DataRobot. Welcome to theCUBE. Thank you. So talk about DataRobot, because on your site you have a stats, 63 million models or stuff being done, which is a grabber, certainly on the website that datarobot.com. What do you guys do? That's a huge number. Is that the number of customers? What do you guys do? That's a breakdown of what you guys do. That's the number of models that have been built from our customers utilizing our cloud platform. But DataRobot believes that every company on the planet would benefit from doing predictive analytics. The problem is there's a finite number of data scientists to do that. And that's where DataRobot's changing the world. We are building a automated machine learning platform. 11 of our data scientists are in Kaggle's top 100. We have three former number one data scientists, number one female, number one in the world. All work for DataRobot. So what we've done is we've taken their collective knowledge coupled with world class engineering and we've built an automated platform. So what would normally take a data scientist weeks or months? The platform can do in hours or minutes and days. So one of the things people talk about and first of all, machine learning is hot and that is really the under the hood value for a lot of things we're seeing with cloud and some of the apps that are out there. And hear about AI, all this stuff that's kind of futuristic, certainly getting the headlines. But there's some real action going on with machine learning and these underlying technologies that's enabling a lot of opportunities. But the issue that we've heard years and years is our seventh year doing Hadoop world now called Strata Hadoop, Big Data Week, whatever you want to call it is how do you scale data science? It's just too hard to become a data scientist and they're coming on board so the numbers are going up, sure, but just exponentially they're not growing. So the question is how do you scale data science? How do you put data science in the hands of analysts or business people? That's always been the problem. What do you guys do to attack that problem? So through the automation of the platform, as you can imagine, we greatly increase productivity of data scientists. If we can bring their normal tasks from weeks to months down to minutes and days, but more importantly, I think, is an autopilot feature of the platform itself. So it allows non-data scientists, business analysts who don't have math, stat, coding skills, but they have great domain knowledge to be able to use this autopilot feature. It's kind of data science with guardrails so that they can create models, they can implement the model and make predictions with zero coding. So that's really what changes the marketplace for a business analyst doing data science. So are they starting from scratch and answering questions? Are they starting with a model and doing variants? Are they picking from a menu? This is best practices within the question that you're trying to solve and then making a tweak? I'm trying to understand how does the platform get me as a non-data scientist to the algorithm that I need and when I see 63 million, it's a huge number. Clearly, everyone's getting their own flavor at the end of the process. We're not really sharing a popery of best practices. So how does that actually work? So one of our guiding principles is not just to find a good model to solve a problem, is to provide the best model to solve a specific problem. So what we do is data is introduced to the platform. The platform then looks at all of the open source models written in R, Python, H2O, Spark, ML, Lib, and it determines what the best model is for that specific use case. And that can be done with or without data science knowledge. Obviously coupled with a data scientist, they can tune and configure specific to the use case, but it's really not required. So you're doing data science for data scientists, basically. You're providing a service to make them more efficient as one of the core value proposition. It's a productivity tool for them, but it's also opening it up to what you had mentioned around letting business analysts do data scientists because there is a shortage of those resources. So if they could do predictive analytics using the platform with the guardrails that we have built in so that they're not overfitting models, they're not making bad predictions, we protect them from doing that. So now we have a far broader audience that can do data science. We can now incorporate data science in any company. So you get a flywheel going on. Let me see if I get this right. So you've got to recruit the core data scientists guys who come in and they have some serious chops. Devops, they write some serious code. And then as they get the flywheel going, you build on that. So it's a community aspect of the code. And then as the flywheel gets going, that renders itself for the general broader market. Is that right? That's correct. And again, if a company doesn't have a data scientist, it doesn't prevent them from doing predictive analytics. We have a lot of customers. We have customers that have 10 employees. We have customers in the Fortune 500. But that doesn't prevent those smaller companies from doing it. This is exactly what Peter Barris was talking about. He now is heading up our research where the community aspect is a critical part of the business model because if I don't have a data scientist, I can come into the store, if you will, the community. And either engage a data scientist directly or use some of the product there. Is that how it works? Because can I take advantage of some of those best practices? Can I stand on their shoulders, if you will? So that's something that we offer also. So the product enforces best practices and the knowledge that our data scientists brought to the platform. But then we also have something called Data Robot University. And Data Robot University, of course, has product training. But it also has other collective knowledge and best practices, the recipes that these data scientists that we have used to win and be number one in the world and apply that to real world business practice. So now we can provide a recipe for machine learning success through our Data Robot University. Not only teach our customers and prospects, but we teach our partners who are reselling. We're at the university level, so it can be used as a learning tool for students as they come out. And maybe they don't have real world knowledge on the product or on the data science. But they can use Data Robot to get them to find the best model, to implement the best model, and not have to do that through a coding intense type of application. So one of the things that we know, first of all, you do guys are doing well, I love the name, bots are hot. You've seen the chat bot thing happen with Microsoft. I don't know if you saw that. But bots are a way to automate, which is a very DevOps concept, which data science love. But you guys are doing some real hard hitting value for customers and verticals. And it seems to be that's where the action is. Can you talk about the use case specifically? We love sports. We have sports data, SV and event we have here in Silicon Valley, where you're seeing sports as a big market for using data. And Amazon reinvent, they always put the MLB example up there. You guys have a specific use case with a team using Data Robots. Can you share color around that? I can. So Data Robot is vertical or agnostic. When it breaks down to data science, it's a binary condition or a regression analysis or recommendation engine. They span all industries. But we do have a scenario where the statistician from the real live Moneyball, Peter DiPedesto worked for the New York Mets. And they implemented and did their player selection using Data Robot for two years. We were very proud that they made it to the World Series. We had even Boston fans who were headquarters, were rooting for a big win there. They came up second place there. But we were still very proud that we were part of their getting to that stage. Well, can I hack the algorithm? Because I'm not a big Mets fan. Obviously, the Red Sox fan, 86 was just a disappointment. The Bucknett, anyway. So have you done it for the Red Sox too? Or is it just the Mets? Mets, obviously, doing great. It was just the Mets. So you're taking credit for all the Mets success. We'll share in some of that. Chris, how do people get started? I mean, do you find most of your customers are already kind of in this space? And this is really an efficiency tool that they can do more better faster? Or do you see this as kind of training those sort of people that know they want to get in the space, and they're not sure where to start? Or you mentioned a 10-person start-up, they don't necessarily have the expertise. From your experience with all your customers, how do people get started? What's the easiest path to success? If somebody's out there watching saying, how do I get started? What would you tell them? So there's a number of different starting points. I think what we're seeing is that from an enterprise perspective, the larger customers are trying to get off older legacy technology that's not producing the most accurate models that are very reliant on expensive infrastructure around full SaaS platforms or IBM SPSS platforms. And they want to move off and leverage the newer technologies that are producing more accurate models, more scalable models, and implementing those very, very quickly. And that's where we're seeing a lot of starting points. And how are they measuring success beyond just straight up ROI, just a hard number? What are some of the other KPIs that people are using to say, wow, this is actually working well? Two things that are really important. One is the accuracy of the model. And not only do we produce a leaderboard when we perform that competition against all the outstanding models, but that leaderboard is ranked based on accuracy. But it also gives you a matrix on performance. So if somebody has a real time predictive analytics needs, they can find the fastest model with, or sorry, the most accurate model within the time frames that they have to make predictions from. So we see, we offer that component as part of the product itself also. What are you guys doing to attract the kind of data scientists you do? Because you guys have an awesome model. I mean, Docker, by the way, had a similar flywheel, different in a different way, but the community aspect's key. But you gotta get the community. It is the chicken and the egg, so I'm sure the data robot folks are like, okay, how do you lure them in? Or I would say lure, and it's not the real community word, but how do you attract the best talent? Is it just put the best tools out there? Do you guys have a plan? What's the? So I think the data scientists that we have and these top data scientists all like to work together. They like to leverage the strengths that the other data scientists have. And the fact that they can all work collaboratively and to build a platform based on their knowledge. So when data robot is lacking in an area, they can collaborate and find a way to again produce the top model to produce the most efficient way to implement those models. So they enjoy working with each other and we found great success in recruiting that way. What's the vision of data robots? What do you guys see? Obviously you've got some funding, NEA's involved, love that big name, and they do a lot of great deals. You guys got plenty of funding. You're Boston based, you're not in Silicon Valley. What's the vision for the company? What's the next path for you guys? What's the next moment you're going to climb? So we really want to make data science and predictive analytics available to all companies. And in doing that, it means that we have to have a very flexible deployable platform. So you saw the 60 million models. That's for our cloud based. As you can imagine, most companies today can't put data in the public cloud. So we support private cloud deployments. We support bare metal Linux and Hadoop, which is kind of our connection here to Strata. We have a deep integration with Cloudera and we've deployed the product and integrated specifically to Cloudera, specifically to Hortonworks and what they offer. Some of the examples of that are we can deploy via Cloudera's parcels. So as a single object not connected to the internet, we can deploy this to hundreds of nodes. And we separate out the distribution with the activation. So you can deploy to the hundreds of nodes when ready to activate. You simply do that. It gives you 100% uptime. You can do rolling upgrades. We've also integrated to Cloudera's CSDs, which is their custom service descriptor. So DataRobot looks within the Cloudera manager platform just as any of their projects. So when you see HDFS, you'd be able to configure Managing Monitor, HDFS, HBase, Impala, Sentry, DataRobot is another project within that. What about outside of Cloudera? Because one of the things that people want to do is integrate in. We hear a lot of things about completeness and integration and operationalizing. And that's not necessarily a clean sheet of paper. In some cases, if you're at the luxury of booting it up from scratch, you can roll out Impala or Cloudera. But in most cases, someone might have an Oracle database or the databases might be older. How do you guys deal with that? Or is that something that you're what the data scientists deal with? No, so that's okay. So if somebody has a bare metal Linux platform running a database, we can ingest data from those different sources, whether it's a Duke-type platform or not. We also incorporate across the board the things that enterprise customers require around security. So integrating with Kerberos, Active Directory, LDAP, and then authorization using grant revoke with things like Sentry, and then all of the projects that OrtonWorks has also. So the Ambari integration, Ranger, Falcon, Atlas, those were also building in that integration. The benefit to Data Robot is we leverage all of that infrastructure. So if they're sitting on a fully encrypted platform that has all of the governance, compliance, and lineage capabilities, we leverage and sit right on top of that. What's your take of the show this week? Strada had Duke, obviously Big Data Week, Big Data and Silicon Valley SV as we call it, Big Data SV. What's your take on this year's, what's the vibe so far? Now the show's kicking off, but what's the smell in the air here? What's the vibe you're seeing? We're seeing quite a bit of interest in predictive analytics, of course. That would be very interesting to us. And being able to deploy on the platforms typically around Big Data, we're seeing great success there. I think that what's also unique about the integration with that platform is the merging of technologies where you have in-memory models written in R and Python, merging with the scale out models that you see in H2O and MLib. And what Data Robot does, it builds transparency around that. So we'll pick the best model based on a specific use case so that users don't have to worry about R coding, Python coding, Scala coding. All of that is encompassed into the platform itself. How many employees you guys have? What's the growth plan for Data Robots? What's some of the tactical things you guys doing? We're seeing significant growth and we have about 140 employees today. We were very fortunate in our early rounds of funding that our backers allowed us to fully develop the product and not be influenced externally by revenue and customers driving the product in a direction specific to their needs. So we built the platform and deployed it as a production ready platform over a year ago. Now we're looking at the ability to scale on any platform which is cloud, private cloud, bare metal platforms and Hadoop. So you're vertical and agnostic on the application side and you're platform agnostic on deployment. That's correct. That gives us that goal of being able to deploy to any company. We're also seeing though within certain verticals specifically around finance, healthcare, insurance that have very high modeling and predictive needs, certain areas where we can enhance the product to be very specifically vertical for them and then also resellers and value added partners that are embedding Data Robot as their predictive engine. So they don't have to focus their time and energy on that component of the product. They focus on the vertical value that they're bringing to the platform itself. I was going to say, and you said that you just closed the B round which is not easy these days in this funding environment. Somebody tweeted this morning that there's no tech IPOs, John and Q1 of 2016. And you've had a lot of conversations about difficulty with B rounds with some of our BC community. So congratulations. You guys are obviously doing something right. Yeah, thank you very much. Well, appreciate you coming on theCUBE and sharing your insight. Of course we could use your Data Robots for our Cube Madness promotion. We have a Cube Madness that's looking at angle.tv. It's a little takeoff on the NCAA where you vote for your favorite Cube alumni and we're at the final four. So get your votes and go to the Cube Madness. Check it out. This is theCUBE. Congratulations on your success and have a great show. This is theCUBE, extracting the signal from the noia. I'm John Furrier with Jeff Frick. We'll be right back with more after this short break.