 I'm Peter Burris with another CUBE conversation here in our beautiful Palo Alto studios with Southern Joads of Domino, yeah? Now Southern, we're going to talk over the course of the next 15 minutes or so about the challenges that the data science, big data universe faces as it tries to move from a bespoke creative approach to a, I don't want to say routine but a reliable capability, a strategic business capability approach. Now that's going to require some pretty significant people process and technology changes over the next few years. Once you start by giving us the nature of the problem that a lot of these companies face. Sure, so we work a lot of really large organizations who put data science and models at the center of their business. And to say it differently, they want to become model driven. However, that's a lot different than being able to make an individual data scientist more productive. One of the ironies about this space is a lot of the conversation in mainstream press and media is about how they can have a better tool or something to make an individual data scientist more productive. Those are great, we love those, these are important. But that's different than making an organization be able to embody data science, make models a core part of their business and become something like an Amazon or a Netflix versus like, oh, I got this data scientist, I gave them a new deep learning model, now they're 5% more productive. That's nice, that's not, oh, this business is now we're getting 5x returns from my investment in data science. So we like to say Wikibon that the difference between a business and a digital business is the role that data and data assets play in the digital business. And if I can kind of marry what you just said to what I just said, effectively what that means is that we're trying to generate increasing returns out of a data science function not by treating the data scientist narrowly as the asset, but by treating the data, the models, the artifacts that are created by that process as the asset. So the company can connect things together faster and have those assets appreciate and value more rapidly. Have I got that right? Yeah, exactly. In fact, I'd say, you know, the difference between an organization who's able to scale data science and those that are not, it comes down to a couple, I would say small barriers or differences and that one of those is, and you said it right, is being able to take those assets, those data assets or models and build on each other and get exponential returns from each additional model I build. Most everything done today is an artisanal, you said it before, but spoke approach. How can I take three or four data scientists that become 20 data scientists become 10 and get increasing returns on that? Now I've got a real business built on that. The other challenge we have, and also it's kind of somewhat of a hangover from the big data era, and that is that, hey, I've got a lot of infrastructure and I've got a lot of amazing tools at hand, but do I have infrastructure that enables me to do data science? Do I have an infrastructure that enables me to take data and turn them into assets the business can run? Or do I have an infrastructure that's set up to store and process data? And those can be two different things. Well, let's talk about that because the big data term is kind of falling out of favor, which is I guess okay, but it's also unfortunate because in many respects, it's not that big data has failed, it's that the, it's becoming more, it was for too long associated with the infrastructure activities of how we set these capabilities up, not the expertise capabilities, and especially the application or model capabilities. Talk to us a little bit about how that market seems to be bifurcating and how that's dragging people along with it because we're seeing people end up in one kind of expertise bucket or another kind of expertise bucket, depending upon the characteristics of the assets that they're building. That's right. And a lot of the folks that are invested in Hadoop or whatever big data solution you have, they've done a great job creating a platform on which they can build applications. But that platform itself is not an application, it's not a model, it's not a product. It is fantastic at processing large amounts of data, probably fantastic at creating great data pipelines, but that's not on which you, where you build a model that goes into production, it recommends a better product to someone who drives higher sales or reduces your churn because it's finding a better way of issuing a UI and a factor to somebody. Those are different things. And you're right, people get pushed in one direction or another, they go, okay, I'm going to go over to this build applications or business impactful things on top of this, or I'm going to go in the world of supporting that through data infrastructure. Both those are important, you can't have one without the other, but they're different. And many, many people said, great, now I have my data lake, now what? What do I do? What am I going to do with that? We believe that models and data assets of your world or data products, a lot of people are going to call them, that's one of those amazing things that are going to come out of this error. So let's talk about the people. What is the difference? If I look at a resume of a real data scientist and someone who's going to become more in support role of maintaining the operations of some of the pipelines we're building, ensuring that some of these data products, data models, whatever are remain valuable to the business in an operational sense. How do we see the difference between kind of the data scientist and people who are supporting that? And we'll talk a little bit about the tools that they're going to use to dramatically appreciate the value that's capable of delivering to the business. Yeah, so I think one of the first things we've seen is that data scientist or even data science leaders or leads in their industry or even data science managers, they have a different perspective on what they're doing with data science. It's not, oh, I'm going to build a faster, better model or I'm going to find a better way of building this pipeline. Those are the folks who are thinking about it saying, hey, if I can combine eight to 10 or 15 people's knowledge and compound knowledge of my entire team and I can make data science something that is not just an individual developing something and as a group of folks developing a core competency, those will be the folks who will accelerate in their business. Those will be the folks accelerating that career. They'll be the chief data scientists of the big organizations going forward. The folks who are, and arguably just as important who are much happier about, I'm going to find a tool or a way or a manner to create a faster pipeline. We're going to find a tool or a manner of way to process this new seven-layered neural network at five extra speed. Hey, that's all good. That's all important. But that's a, those are task-driven roles versus leadership roles, if that makes sense. Or perhaps let me ask you if this is right. There's leadership, let's call it asset formation and then there's task-driven or asset maintenance. Is that a decent way of thinking about this? Sure. So talk to us a little bit about that asset formation role. So these are people who might, the organization might have tens of them or hundreds of them, folks who are actually still going through the process of creating these crucial data assets at Amazon or Netflix or some of these other companies to, and leaving them behind so they can be combined in other interesting ways. What types of people are they? What types of processes are they following? What types of tooling are they using? Because they're still using tools, right? Yeah, so in fact, there's a lot of advancements now that are allowing them to use a number of different tools. And most, a lot of those advancements in the open source community, which is fantastic. It gives access to much more people to get there. It advances at a much faster rate than a proprietary algorithm would or a proprietary software would. And they're learning it in school now. When I went to school, I learned COBOL. Now folks are learning Python and it's just a big, big difference. And so that is just accelerated the pace which these folks who are creating these data assets can do it. What makes them more productive is when they start using all these tools on a single platform, on a platform on which they can collaborate, find out what the guy, three desks down or three countries over has done. So when they make their next data pipeline, they can actually use what the other person had done. That makes them much more effective. Or if I'm gonna create, do this, build this new algorithm, I can at least borrow from what someone else has done. Now I've started to see a collaboration aspect that today doesn't really exist. Everything is like, I'm faster, better doing this on my desktop. One side fact we haven't mentioned yet, which enables folks to do this is, is people's emergence to the cloud. As I move more and more to the cloud, by nature I start to become in a more collaborative environment versus a desktop environment. But that's where I think starts to see, you'll see a switch happen. When they aren't all individuals who are just rowing and doing things faster with a better tool, they're actually coming together as a group and build off of each other. So this has happened in a number of other domains and has been happening in the data world as well. But one of the things that makes it so interesting in the data world is that a data asset is distinct from other kinds of assets. And that other assets are governed by the laws of scarcity. I can apply fuel to this job, I can apply it to that job. I can apply this machine to that job, I can apply it to that job. Going back to the issue of the data scientist, I can apply the data scientist to this, and I can't apply them to that at the same time, which is why you get a linear increasing of productivity out of them based on their experience. But the value of data is that it can be shared, it can be combined, it can be merged, it can be reused, it can be repopulated. A lot of things that you can do with data that you can't do with other assets. But that requires that we think differently about how to treat data as an asset. Give us some insights. Do we think about data as a, almost a community or portfolio? What do we, how do we start to formulate some of these ideas of data as an asset? Yeah, so I think, and you're right, and it doesn't somewhat argue it suffers from scarcity because they have a hard time getting access to it. But in general, no, it doesn't. But we, and let me just, I think a perfect example would be something of this nature. If I take a set of data, and I develop a churn model on that, that helps the business because I've reduced my churn by a tenth of a percent. That's good, a billion dollar company, that's certainly a valuable impact, it's real money. How do I actually make that next bunch of return? That model and that data that I use to do that, oh, it turns out I can also find out there's a way of driving a user interaction in our particular software application or in our website that would not only reduce churn, but it actually increased the amount of revenue I drove. So that's a three or four extra. And that is the difference between, I think people that think about data as, oh, I'm going to build something with this data, and people are thinking about data as, hey, now that I've built something around it, what can I build off of that additionally? How can others benefit from something I've already done? So that's when I think data becomes this kind of exponential asset. And does that mean that we're thinking about not only data scientists as a community that has to work together and collaborate and has common perspectives and common language that they use within an organization to get that capability? Are we actually now starting to think about data assets as a community, or data assets as a network of things to be managed? How do businesses start to think about that? And what does that say about the characteristics of the tools and the approaches that they use to build this capability over time? Yeah, so I think from the world of Domino, whether that's a data asset or a model, they're not owned individually. It is like a community. In fact, within Domino, people build model graphs. The models are all related to each other. There's models upon models. There's data products. Let me explain what that means for people. So really quickly, so the notion of a graph is not like X and Y axis. The notion of a graph is a way of describing the relationship amongst different entities. So we're talking about this data and how it relates to that data and how it relates to that data. Am I got that right? Exactly, exactly. So again, if I built this true model, that true model may be built off of other models. So there's three or four different pieces of data that come together to develop a particular model. That has a big return. Now, what gets really interesting about that is, let's say that particular two or three data scientists who help work on that really complex true model, they leave, they're done. They're going to go save the world. They're going to develop models that increase crop yield somewhere. How does the company actually reproduce that model, make it better? Now I have to think about the process of using your data to build models as something that has become reproducible and reusable, as you mentioned before. That's what separates companies that can actually do that versus those who are, frankly, a massive key man risk. They're sitting there saying, oh my gosh, if that data scientist leaves, we're not going to know what to do. That has to be, there has to be a way that people can approach this. They treat data as a community instead of assets or models as assets and understand how they did build that product that was delivering an extra turn to the company. So two quick questions. One is it doesn't mean that we're diminishing the value of the data scientist. On the contrary, we're demonstrating how they can become even more valuable to the business over time by focusing on the assets that they create and not how they spend their time. Yeah, and I think if you ask the data scientist, hey, can you allow me to spend my time better by building on what others have? Every single word of reason, but yes, I'd like that. That would be great. Right now I work in silos. That's not beneficial for any of them. Right, tell us a little bit about what Domino's doing and how Domino's going to bring more of this richness to the data scientist's job, life, career aspirations, business capabilities. Sure, so what we think about what we're doing at Domino is we're powering model driven businesses. We're enabling organizations that see that the world will run on models in the future or is running on models today. Look at Amazon and Netflix and folks like that. We're enabling them to run their business that way. And that helps them overcome a couple of challenges we just been discussing here. One is the artisanal approach lots of people think about. Instead of data scientists working as individuals, they work as a team. They collaborate. They build models on each other. They are able to much more rapidly get access to hardware and other models and other data that others have done. That gives them that return. Enable us to operationalize them? So I now have a true full loop of data science. I go from the research and development phase, the production stage and back again and I iterate really rapidly. The faster I iterate, the better my models become. And lastly, the part that I think is the unhidden or not spoken about very much truth is we enable them to have model integrity. The big fear of a lot of companies having on models is it opens up liability. Knights Capital, it's a pretty famous story. Great model, slightly implemented incorrectly in production, lost $440 million in 45 minutes and that organization failed to be in business much longer. By building models with integrity and able to data scientists to build models with integrity, they're protecting themselves from that risk. Southland, this has been a great conversation about the challenges associated with building strategic business capabilities around data science and what Domino's doing to help that happen. So this is Peter Burris once again with Southland Jones with Domino. Southland very, very thankful for being here on theCUBE and we look forward to having you in another CUBE conversation in the not too distant future.