 From the SiliconANGLE Media office in Boston, Massachusetts, it's theCUBE. Now, here are your hosts, Dave Vellante and Stu Miniman. Hi everybody, welcome to this special presentation, The Evolving Role of the Data Scientist. Earlier this decade, Hal Varian, the chief economist at Google, sort of dubbed the data scientist as the rock star role, the rock star job of the next decade and it seems to be actually coming true. I'm here with Stu Miniman, my long time and often co-host and our newest member of the Wikibon team, Jim Kobielus. Jim, it's great to see you. Let's start there. Let's introduce you to the team. Those of you who don't know, Jim, former forester, even before that network world, our days at IDG, just coming to us from IBM, big data, AI, cognitive, welcome. Tell us a little bit about your background. My background, while you've summarized the most recent, I was at IBM for five years. I was their data science evangelist. Prior to that, I have a long lineage in the analyst space. I was at Forester, I was all things big data and data warehousing, business intelligence. I had been at current analysis, Burton Group. You mentioned network world for a long, long time. I wrote a column for network role as a freelance thing starting in the late 80s. I'm sort of revealing my age here. I don't wanna go too far down the road there but I've been around the block in the industry for a long time with a lot of different focus areas and at Wikibon, my core focus is all things app dev and deep learning and AI and data science which very much traces a sort of a path of a continuity from what I looked at under IBM and previously. And you know, John Furrier in 2010 in theCUBE said, the data is the new development kit in the time. I sort of really didn't understand that but it's becoming true. Now, Stu, you guys were at DockerCon last week and you're seeing, you know, is it dev ops? Is it ops dev? But the role of development and operations coming together, infrastructure is code. We're going to talk about that a little bit but give us your quick take on what's going on. Yeah, Dave. So, you know, Wikibon started with a strong focus on storage and we've seen in the storage world it's not about storing the information. It's about how do we leverage that data and in my background in the networking side you're talking about how, you know, the analytics and real time components and understanding what's moving in pieces. You know, how that's changing it and you know, remind me of what, you know, we worked with GE when they launched the industrial internet. So, you know, we've watched at Wikibon, Dave, that kind of launch of big data. A lot of that goes into now, the machine learning and AI space that Jim knows well. That evolving era of, you know, data is at the core of everything and that has such a huge impact on, you know, all the pillars that we cover in Wikibon, especially, you know, the stuff I work on infrastructure in cloud, data is always at the core and center in the applications. Jim, talk about how this data role is developing. Obviously, the data scientist, there's even the chief data officer, we can maybe address that, but let's get closer to the application developer. What's the sort of spectrum of development and what's the process look like? Well, yeah, if you look at the eras of computing and you can sort of trace the evolution of computing by where does the logic, the application logic, and who develops that application logic and maintains it. If you look at the very beginning of computing starting from Babbage or beyond, you know, it was hardware. Clearly, it was electromechanical calculators and tabulating devices and so forth. Around the time of the Second World War, we started to have stored program control computers. We had Turing and Bob Neumann and those architectures. And a new cadre of professionals called programmers, another called coders, developed to specify the application logic between COBOL and 4G, you know, L, and so forth. Different languages. Starting around the turn of this millennium, or really the turn of this decade, we started to see a third era develop beyond the hardware era and the software era of application logic, towards what you might call the, well what has been called the cognitive era where the application logic doesn't, there's less and less need for it to be explicitly programmed but actually it's learned from the data. How is it, how is the logic learned from the data through the magic of data science? Statistical algorithms, machine learning to identify predictive variables, to drive things like recommendation engines and next best action and predictive, you know, applications and so forth. So data increasingly is the core engine as it were that's driving the development of the logic that's in line to so many applications now in the era of artificial intelligence. So data scientists are key developers in this new era. Okay, so you've got the data scientists but then there's this whole sort of spectrum of people who touch data. There's the data quality engineer, I guess the data engineer which may or may not be different. Even the application developer, you have so-called citizen developers or low code developers. So is this one person, multiple people? Is it arms and legs? Is it one person? Is it a unicorn? I don't think there are that many people who are so all capable jacks of all trades and good at all of them that could do every single thing you sketched out but there's a lot of smart people in the universe so I'm not gonna diss the unicorns that actually are there. But in fact what we see in development organizations now is clearly a specialization in terms of the data era, the data science era. There are data scientists proper, I'll use that term who build and train and test statistical models against empirical data. They're the ones who build regression models and they're the ones who build all the other algorithmic logic that goes into these applications. There are data engineers, they're the ones who will build the data lakes so that Jan Hadoop or NoSQL and so forth that are used by teams of data scientists to build and test their regression and classification and natural language processing and machine learning and the other models. There's a strong and continuing demand for coders, for programmers to build the business rules and more of the deterministic or declarative and procedural logic, if that else statements and so forth that you need for a fully-fledged application. But there's other specialties as well. You need subject matter experts in many, most data science projects. They're the ones who understand these solution domain that you're building these models and applications around whether it be marketing or security or autonomous vehicles and whatnot. You need experts to work with the data scientists to build the feature sets and so forth. There's other specialties that are critically important. Data visualization design, user interface design UX, hardware engineering, more and more of like machine learning and deep learning algorithmic models are going into the edge applications that are residing on IOT endpoints to drive various automated actions and so forth based on fresh data and algorithms that are embedded in those components. What I'm getting at is that the development ecosystem of different roles continues to expand. There's a lot of strong need for coding but data scientists, the core statistical modelers and explorers absolutely essential in more and more disruptive applications, cognitive chat bots and face recognition and voice recognition, things like Siri, we're all using, I use all the time. I write most of my text now using Siri on my iPhone because it's gotten scary good at speech recognition. So everybody talks about the shortage of skill sets. How problematic is that in terms of growth and Stu, we talk about the ops dev, are they going to come from the operations world? Let's specifically focus on the enterprise. Are you seeing like the guys you saw at DockerCon? Are they moving into the data science realm and the data engineering realm? Are they more sort of doing DevOps type of work? Yeah, I think on the infrastructure side, it tends to be more of the DevOps type of folks. Definitely kind of the coding piece gets into a lot of environments but you don't see somebody that was like, okay, running around data centers, pulling cabling, tomorrow we come in a data scientist. Remind me, Dave, of you and I did two years ago with the MIT Sloan folks, the second machine age era, talking about how has the first industrial revolution kind of replaced what we can do with muscle? We can bring machines in the second machine age is what's going to either replace or greatly augment what we can do from a cognitive standpoint. So that's going to have a huge impact on jobs. It's interesting, I look like Amazon's a huge hireer. I think the number I heard is they're going to have 100,000 jobs in the next 18 months that Amazon's going to hire. Right now there's over 5,000 jobs open for the Amazon web services and a lot of them are using data, leveraging data. Every new company that I see in job description I see is right, how much is data part of what you're doing? It's something from research standpoint, while we say, well, without data you're just some guy with another opinion, right? So data is just so infused in everything we talk about and how much will machines help us to grapple because we know the four V's of data out there, I can't as anyone, there's the unicorns out there that can read the matrix of flying information everywhere but there's so much information coming in. We need the tools and the people going together, dramatic impact on what's going to happen and right, we said if your job, if you were a storage person configuring lines or networking person, cabling things, your job's changing and how do you get on board this wave and race with the machines not fight against it? Well, you brought up the second machine age. We did that event with Bryn Yolson and Andy McAfee from MIT and Jim, you mentioned the cognitive era, it's kind of an IBM term but one of the criticisms I've always had of IBM is that if you look at, humans have always been replaced by machines but first time in human history it's increasingly been being cognitive function. Yeah, knowledge work. Knowledge work and IBM was shy away and other vendors as well, oh, we don't want to talk about replacing humans but in fact that's what's happening. That's happening. I mean, you can argue very strongly that the middle class is kind of getting hollowed out, the media, the data supports it. The media and income really has been flat or down actually and one could argue that it's due to cognitive functions, machines replacing humans, you might want to debate that. I'd be interested in your thoughts on that now that you're outside of IBM. But so, isn't it the data professional that is ultimately going to be allowing combinations of innovation to occur and new jobs to be created? Now whether or not they can be created fast enough to offset the decline in existing is, we'll see, but what's your take on it? That's structural unemployment, clearly. There's displacements in any turbulent and innovative industry. Old industries die, sometimes they die quickly before the new ones are fully born, understood. So, no, I'm not going to downplay the potential, the actuality of structural unemployment right now as whole industries get hollowed out and new ones get created and data and algorithms and AI and so forth are a major player in the realignment of all industries understood. So, in terms of where the jobs are coming from, you could say that everybody on some level will need to be a data scientist but that's sort of overstating it. Everybody needs to get really savvy on data and on the underlying, really the enabler for delivering value from data, which is machine learning, machine learning models, more of those are being built into applications everywhere and delivering new value. Now, there is a growing range of more, I would call them closer to self-service tools that allow the rest of us, by which I mean business analysts and subject matter activists to build more machine learning models and predictive logic on our own without needing a PhD data scientist to help us every step of the way. That niche of tools from any number of companies including the one that you mentioned just a moment ago are coming along, but I wouldn't say that the notion of a citizen data scientist, that's a term that's got parlance. I wouldn't say that this is necessarily a space that's dominated or will be dominated by the do-it-yourselfers, but there is a growing range of people, especially millennials who've taught themselves data science from the get-go and it become fairly effective at building and training and iterating machine learning and predictive analytics and so forth from open source tools like Spark and R and now TensorFlow, for example, for AI and who built their own data lakes on top of Hadoop and so forth have been able to build really great data-driven applications, machine learning and so forth from open data sets of which there are more coming along every day are able to assemble data science and developer teams from open communities like Kaggle, which by the way, Google I believe recently acquired. So there's more open resources to allow the motivated professionals who want to get deeper into data science and app dev in a data science world to do their magic and to do it without the traditional university degree or really without the traditional background and doing this work in the corporate world. That's coming along, so to do it yourself is there are a growing number of them out there but I'm not going to overstate that trend. There's a serious learning curve to get really competent in data science. And the hardcore data scientists hate that term of citizen data science. But certainly there's a low code. I've been flamed by the way by many of them when I brought them up in a variety of industry forums and spoke about it. Well, that panel that you and I did, I mean it was with the 10 rock star data scientists when that term came up it was, you know they just poo-pooed it totally. In my prior life. So yes, in your prior life. But don't you think that it's really the survival of the physics? I mean you're admitting that yes there's going to be disruption in vocations. Isn't it the survival of the physics? This is why I get sometimes concerned about some of the public comments from President Trump about just sort of protectionism. I feel as though not to get political here but hey, learn, get educated if we have to fund education fine. But it's really the message to young people is you got to go out and find those new opportunities and learn how to use data. Do you guys agree with that or? Yeah, absolutely. I mean, didn't we see this with cloud? All the infrastructure guys who wanted to just provision lungs for the rest of their lives? Dave, I think about when we talk about cloud a lot of it is, you know, how do we shift things to a platform or a vendor? And a lot of it data has gravity and is going to live in some of these platforms. You know, the joke we always have is don't you think that, you know, the NSA, Google and Amazon know everything you're doing it's not just because I have an Alexa and Google home in my house and they're listening all the time but you know, they are gathering information. You know, both of those, you know, Google and Amazon both getting in the autonomous car visa. Amazon just announced that. So where the data lives? Boy, you know, Pete, we talked about the disruption that Uber's going to have just because, you know, it went from kind of a full-time job of a taxi driver to a part-time driver of Uber. Well, we know the real disruption is going to be when we just have fleets of self-driving cars and then nobody needs to drive anymore, so. Okay, so that brings me another question I have for you, Jim. The data scientist is building the data model and the data is informing that model and every cloud vendor says, oh, the data is yours. But if the data is informing the model and the model is being applied in different industries and different use cases, how is my data, my IP, not feeding my competitors? Well, if they don't have access directly to your data they can't use your data to train their models. So if you've got the best pool of data for a particular domain, whatever, you know, autonomous vehicles or whatever happens to be, that first mover advantage into that space, with that data, the data itself, is an asset that your competitors don't have direct access to. So they can build models but their data might not be as good or as fresh or as valid as yours. So their models will be less predictably fit to the application, the common application domain you're both targeting. So keep in mind that the first mover is like, like I say, like Uber, whatever, who has achieved a scale first in a given space, they got that massive data set, the Googles of the world, the Facebooks. That is actually a barrier to entry for their competitors. You can't match a Google on, you know, YouTube, for example, videos, it's got YouTube. That was a great acquisition for them. Well, there's been a lot discussed about first mover advantage in the merits or not of first mover advantage. And you go back to the PC examples, so many examples of PC days, I mean, the PC era of first movers who actually, you know, went out of business, but is the first mover advantage because of data becoming more important? Now granted, Facebook wasn't the first mover, it was a Friendster or MySpace, but Facebook now has a data advantage. Were they the first data advantages, maybe a better way to put it, is that notion of first mover, if data is the lever, going to sort of come back and vogue? Yeah, and if I can build off that, you know, we talked about Dave, kind of the customer innovation flywheel that's driven a lot of last generation, is data the next driver of the flywheel? Well, so Amazon, with Cloud, obviously has a first mover advantage. Now Google wasn't the first in search, but they were the first to actually do search in a sort of data driven approach as opposed to a portal, you know, as a destination. And so Uber is another really good example. You're actually seeing some examples of first movers, now whether or not it's sustainable, we'll see, but at the heart of it is data, right? Well, for example, Tesla has gone furthest to getting self-driving capabilities to a degree into commercial vehicles that are out on roads, and so they've got this growing pile of great data that is their proprietary asset for their team of data scientists to continue to tune and build and add additional self-driving features to their products. So in other words, that's an example of a clear first mover advantage that, you know, they squander at their peril, going into that hot potential huge area. I mean, they've already, don't they have a larger market capitalization now than GM, which kind of blows my mind? Yeah, I think they surpassed GM, yes. Kind of blows my mind. So they're Apple, but for vehicles. Right, right. Okay, I want to sort of change subjects a little bit. You guys were just at DocaCon. We love the developer angle. You're going to be focused very largely on developers, obviously within AI and that data space. But Stu, you've got OpenStack Summit coming up. We've got Red Hat Summit next week. Obviously, we are reinvent. There's a huge developer show. ServiceNow, an event that I'm going to be at is actually talking about low-code developers has quite a low-code developer community. Another one is DevNet Create. Cisco, right? Is DevNet Create, it's an inaugural event coming up May 23rd and 24th. Cisco's angle on development. I mean, infrastructure is code, programmable infrastructure. Some of the things that we touched on earlier, Stu. What's the big trend there? Cisco's had a strong IoT angle for a long time. We know, I mean, the network's all about the data that's in it, and therefore, how can I instrument things better? How can I understand what's going on? There's so much real-time data. Jim, you've probably, real-time, we talked about it, it was buzzy for a few years, but there's now real-time where that real-time feedback is actually useful, not just gathering or looking at historical information, but Cisco's at the center of it. As we get orders of magnitude more surface area from things like IoT, there's still the network implications of a lot of that at its core. Is network's background, but is Cisco's background? But Cisco has lots of software assets. They're a very large company with a lot of different pieces and definitely have been courting the developer group for years. And they had this theme about the Cisco they're putting out under DevNet, where apps meet infrastructure. Yeah, right, right. And so they're very much catalyzing a community of app developers around, you know, really billing on fog computing is another term that they put out there, which is a good one to describe the new generation, the new era of big data that's entirely edge-oriented. So Cisco, as an app dev, as a provider of an app dev platform is a, something you have to get your head around, because you, you know, I'm one of the old school who thinks of them as routers and bridges and hubs and all that, but that was way long ago. They're higher up the stack. And actually, like I fought against that fog designation when it first came out like two years ago, but we've seen the discussion was data center to cloud and now cloud to edge. And so therefore, if we're talking about cloud coming down to the edge, okay, fog's an okay analogy we understand there. Cisco sometimes is a little bit early with some of their, you know, positioning out there, but I think we're starting to catch up with the reality of where it is. And Google recently related to edge applications in terms of app dev. Big part of the app dev process for data-driven applications is training those algorithms with fresh data to ensure that they are predictably fit for their purposes. Cisco, not Cisco, Google recently came out with a very interesting capability which is training of algorithms that are deployed at edge devices using a federated infrastructure. I did a blog last week or, I think it was last week or the week before on the Wikibon blog about federated training of data applications and where Google's going. But that gives you a sense for the new generation of app dev and dev ops, which is that more of the training, training is basically maintenance. It's a maintenance activity. Become absolutely essential for all manner of applications from fresh data. That'll be sourced from the fog and then be pushed to algorithms that are being iterated and updated in real time out at, you know, mirrored zillions of edge devices. We have to get our heads around the fact that the data center is being radically decentralized. At a certain point data center, the term itself will be so archaic. We just have to retire it. Well, it's all server farms. Right, plug into the API and there you go. We live in that API economy. All right, we got to leave it there. Guys, thanks very much for coming on and chatting about that evolving role. If you care about the changing role of the data professional, follow Jim at James Kabilis, at James Kabilis, i.e. LUS on Twitter. And go to wikibon.com and follow him there. He's a writing prolific writer. Great to have you on board. Thanks for participating. All right, thanks for watching everybody. We'll see you next time. This is Cube, we're out.