 All righty, excellent. Thank you very much. It's an honor to be here. I love coming to Big Data Spain Actually, let me get back to the title slide here. I love coming to Madrid and especially Big Data Spain this talk here is about AI adoption in enterprise and If you want to get the slides you can grab these online right now with your mobile phones But also that URL I'll keep this up for a second Just the background on this is that I work a lot with O'Reilly media and with the strata conference the AI conference, etc I'm one of the chairs for Jupiter con and for rev con So I work very closely with Ben Lorica who is chair for strata and AI conference And we have lately been doing surveys across the world about topics Related to what we're discussing here at Big Data Spain And so one of our larger surveys this year was about machine learning in enterprise and We sent this out We expected to send it out to a few hundred people some people who attended strata just to get a simple poll Oops, okay Yeah, we wanted to get a poll just a few hundred people Unfortunately, there was a big mistake and some of the marketing people sent it out to I think 15,000 people We had over 11,000 responses back So it was an interesting data set and naturally being data scientists We wanted to try to analyze this and see what we could find out There were some insights that were very surprising and I think that they helped to indicate some of what's happening in Big Data and machine learning in enterprise these days So the intention of this survey was we really wanted to see how is machine learning moving into mainstream business and In the sense of terminology, we were looking at three different persona We're looking at companies that are very sophisticated. They've been doing machine learning for five years or more And so they have experience in this field We're also looking at early adopters Companies that have only been really doing this for one or two years and then other companies that are just beginning to explore Just starting out to deploy machine learning just starting to get into artificial intelligence The distribution on this across the world You know about half of it was from North America Just because of the shape of our audience But we also had very significant sample coming from Western Europe and from Asia again throughout the world So one of the questions is How is it that? The the experienced companies, how are they handling deployment of machine learning models because How you deploy your models in production has become a very good question We see a lot of talks about this in the conferences a lot of discussion about this amongst the vendors and And so this is interesting North America Western Europe, they lead in terms of sophisticated use Whereas East Asia and South Asia, they're coming up in the world. They're really gaining as far as early adopters And you can see the segments here Really the shape of what's going on with North America and Western Europe is very similar But again in Asia, they're they're pushing more on the exploration Another question is what kind of impact has this had on organizations? So a way that we try to measure that is to understand about job titles are these changing Are there new kinds of job titles are being introduced? Of course, we're very much interested in seeing about data scientists and data engineer But we expected to see other things as well So as it turns out more than half of the job titles involved with deploying machine models machine learning models into production More than half of the people involved in that are data scientists There is a significant number that are still having the title of business analyst. All of that's that's a little bit more historical We also see data analysts did engineer research scientists machine learning engineer, etc You can see the distribution on this really data analyst business analyst data engineer and data scientists with the top ones And then we started to look at how does this break down between the companies that are really sophisticated versus the companies that are just getting started So some of the takeaways are that we are seeing newer job titles moving in Machine learning engineer is one data engineer data scientists, of course Deep learning engineer is another title that's being used one thing that's interesting is the the organizations that are very sophisticated about Developing algorithms They tend to use a title of research scientists for people who are really dedicated to algorithms work There's not a lot of those but it was interesting that that's developing So a question that came out of this analysis was Are we seeing convergence of titles? So people who used to be business analysts are now being called data scientists or are we seeing the other direction? hyper-specification So for instance, I've seen people with a title of deep machine learning engineer I'm not exactly sure what that means, but definitely there are people in Silicon Valley with that title Now something to support this I was a really great talk from Miriam Jahanshani She's a tap recruit and they're doing data science analysis of hiring data scientists and Something that they reported was that actually the hyper-specification It gets in the way of building teams for data scientists They do see a lot of title inflation Particularly amongst people who are not as experienced in the field But they also found that if you try to Filter to go after more experienced people like senior data scientists. You actually end up not getting as good of candidates So in general try to avoid some of the hyper-specification Hopefully we'll see more of the consolidation of these titles There's also an interesting thing in that this is a report that came out from Deloitte a year ago And the idea is that now we're seeing teams of people and machines a kind of a mixture of roles so Deloitte had gone out and Interviewed in enterprise the HR departments to try to see are they taking this into account? Are they planning for job titles moving from people to machines or perhaps even the other direction? But are they planning on having machines as part of the workforce? They're calling it the no-caller workforce. They found out that really about 17% of the HR departments had considered this yet So I think you'll see a lot more in terms of hybrid teams All right, another question is who builds machine learning models And as it turns out that We're seeing over half use internal data science teams The companies that have more experience rely more on internal teams the companies that have less experience Rely more on external consultants One thing that really was kind of surprising is that auto ML although it's being pushed And there's a lot of different cloud services for this the buy-in on this still is pretty low It's a single-digit percentage So we're not seeing a lot of cloud services being leveraged yet for for actually building the models perhaps a little change So one thing that's been kind of a debate Definitely there have been a lot of arguments in different sides What's the methodology that's needed to work with machine learning? Because when I go out and I talk with people in enterprise if you say data They say agile if you say engineering they say agile anything you say they say agile afterwards So are we actually doing agile machine learning and as it turns out no? We we really want to take a look at the companies which were more mature in this kind of practice and see what are they reporting about process And as it turns out You do see a lot of companies saying agile But it's not necessarily the default So at this stage we can say that about a third of the respondents used no methodology and in particular the sophisticated Organizations were leading on what they're saying is other We didn't have a name for it yet. So we don't necessarily see that there is an alternative to agile There's something that doesn't necessarily have a name We're gonna try to probe more for that, but I'll talk about this a little bit later in the talk Something interesting that I want to point out who here has ever heard of Ron Jeffries anybody Couple of hands, okay So Ron Jeffries came up with a lot of the the parts of what we consider now to be agile methodology things like Scrum He invented that and he was one of the signatories on the Agile manifesto And now he's come out this year saying that really what they were trying to push toward with the Agile manifesto is Is is nothing akin to what's being practiced now in enterprise So his suggestion is move away from name methods and really there's there's just three points of what developers need to focus on To be able to adopt the spirit of what they were trying to do Stated another way agile comes from about 20 years ago and at the time the way that you create value was to have Software code base and iterate on that software code base rapidly to be able to get features in front of customers You treated the software developers almost as interchangeable parts because well, it was a lot more fluid It was a lot better than what was being practiced before But when it came to data in that kind of world when you're building of Software application you would have a data model. You might be using a database with a schema There might be some Unit tests related to it, but really data wasn't as much of a first-class citizen and now things have changed 20 years later Most of the companies are using open source Most of the critical software that you rely on for your business Somebody else manages it somebody else owns it may not even pay them money for it Instead now the value creation is not about iterating so much on the software as it is about Maintaining your data cleaning up your data getting labeled data getting large enough data sets in place to be able to train machine learning models So the area where the value creation is happening has shifted from iterating on code to how do we work with data? And that brings in a lot of other factors That really aren't accounted for 20 years ago by agile I'd like to point out there's a Really great talk from David Talby from Pacific AI. He does a lot of work with machine learning and healthcare and He's had some excellent talks at the O'Reilly conferences about what goes wrong when you deploy machine learning models in production and It's really fascinating because he has a lot of excellent insights on that of actually Some some pretty amazing failures that have happened in hospitals and whatnot So right after one of David's talks I was giving a talk and I took like five minutes to put together this slide to show Some contrast between what we think of in terms of software development say like web development versus what we're doing with machine learning in production and There's a few points from David Talby also from Pete Warden at Google that are included here But the gist is this when you're building software like in a mobile web dev You put your most experienced people up front your architects your team leads. They're involved with requirements and specifications and Then as a project matures you get more and more people involved who don't have as much experience With machine learning is exactly the opposite If you have a data set and you want to train a machine learning model, that's a homework exercise Your most junior people on the team can do that as homework The problems come in once you've deployed models in production and those models are interacting with the customers That's when things happen that are unforeseen that no product manager could ever specify in advance That's when the problems of security and ethics and bias and all the issues related to compliance come into play So what happens is we see over and over again Organizations put their most senior people their most experienced people After deployment as opposed to earlier in the stage So I think that a lot of the notions of methodology are actually inverting and we'll have to take that into account Especially when you're talking with concerns about bias fairness ethics privacy security other areas of compliance These are really accentuated. There's a difference between the practice of coding and the practice of learning Okay, so who makes decisions about machine learning within organizations We just want we wanted to measure this one to see if we could find out anything about how Corporate structures are our handling machine learning and we got some surprises Typically, you would expect to have product managers determining things like key metrics What are the value of the measurements that are used to estimate success of a project and As it turned out, yes, there are product managers, but only about a third and In fact, when you look at the difference between the sophisticated organizations and the less sophisticated organizations It turns out that the data science team leads are getting a push there They're doing a larger share relatively and You see this especially in contrast with the organizations that aren't as experienced so that kind of decision-making is being pushed away from product managers toward data science team leads and And You know this this would be a challenge This was something that we saw back in about 2008 2009 when there was first this notion of data science being Brought forth at places like LinkedIn and Facebook and others There was a lot of immediate attention between product management and data science One being a little bit more intuitive one being more quantitative and it took a couple of years But eventually that worked out and that led to stronger notions of data-driven organizations There are wonderful examples Some of the early people who were identified as data scientists later became product executives at companies Monica Regatti from LinkedIn for instance So we think that this is an area that's evolving Also, who sets the priorities for the team in general and again You see a lot from product managers But you see a growing amount from data science leads particularly with the sophisticated organizations So I think that this is interesting it indicates that there is knowledge About data science and machine learning that is within the data science teams That's not necessarily out in product management or in the executive teams yet At least that's part of the analysis we get from this Now what kind of metrics are used for success? So if you've heard of something called lean startup I was the director of data analytics at that company in view and We had various fiscated means of doing a B testing a lot of customer experiments a lot of KPIs that we're monitoring Sometimes though we ran into failure modes because we would have a product that was just being measured on one KPI As opposed being measured on several and as it turns out this was an interesting area Definitely a lot of organizations are looking at business metrics like lean kind of methodology would put you toward But more and more again almost half are using statistical measures more kinds of machine learning evaluation which again requires a lot of that domain knowledge and Then the surprise was that we're seeing actually quite a lot Who are already measuring for bias and fairness overall it was 17% but if you look at the sophisticated organizations That came out to yes is 19 So this is interesting We didn't anticipate that quite as many corporate organizations would be already taking bias and fairness Into account in their pipelines for evaluating machine learning models. We thought that this would come later Now it's very hard to measure for fairness in machine learning models Part of the problem is there there are multiple metrics for fairness and if you try to just use one metric you get in trouble There's a great interview here from charade gal and and Sam and corporate Davies. They have a research group at Stanford, which is looking into the math of fairness and bias in machine learning and It really comes down to optimization problems And the long and short of it without going into the math is that it's impossible to get this a hundred percent right So when you're in business you have to own up to the fact that your machine learning models will not be entirely fair And you want to make the errors in the direction. That's right. You want to do the right thing You want to own it make the decision that won't hurt your customers that won't hurt the public Because there's there's no perfect answer This is a message that needs to be elevated now up to executive levels to have them understand what really what is this risk all about? How does it work? What are the dynamics? And and along with this too if you haven't heard it, I think it's something that needs to be talked about more Good hearts law when a measure becomes a target it ceases be being a good measure And I think that that was something we definitely learned back when we were working on lean So don't just focus on one KPI if you go to a bank They'll have a range of different KRIs and KPIs for any product line. They're looking at risk. They're looking at performance as well Okay What is within the model building process essentially what are the checklists that organizations go through before they deploy a machine learning model in a production and As it as it turned out You know we saw like 54% had some experience with checking for fairness and bias 40% overall we're checking for fairness and bias Here's the the details on that Explanation transparency interpretation whatever you want to call it there There are a lot of great tools that are coming out for this. I really like the there's an open-source project called skater That's kind of an umbrella for tools such as lime and shop and anchors and others Really working on having a variety of different types of explainability for machine learning models And it's interesting how much this is already out in the field in production And it's especially interesting to see how there's much more of a push on this for the Sophisticated organizations much less of a push on this for the the organizations that are just getting started They haven't found out yet how important this is um Okay Looking into Just how do the sophisticated companies differ there were four takeaways The more sophisticated companies tend to use specialized roles data scientists data engineer, etc They tend to rely more on internal data science teams as opposed to going outside outsourcing it or necessarily using the club They employ more robust checklists if you will before they deploy models They've learned about the risks and how to try to prevent those and also They're having more and more of their data science team leads set the priorities and and the evaluation criteria for success when they're Putting products out into the field And this is important I want to show here a segmentation of business looking at cross industry And this comes from MIT Sloan report from 2017. There's also Related reports that have come out of McKinsey Global Institute and and some other research as well that we put together But essentially we can look through a lens of AI adoption across Enterprise we can say that there's one segment of unicorns There are maybe a dozen companies like Google and Amazon and Microsoft and IBM and Facebook, etc Who have been leaders in AI? Deploying this into protection for a long time The thing that's characteristic about them is that they have a cloud They have strong AI teams. They've paid a lot to build and they also have business lines that develop data sources Data that can be labeled data that can be used and monetized in other business lines So they're very savvy about balancing those three priorities and leveraging them Now if we look at the other segments though, again, there's like a dozen that have first mover advantage But amongst the other two segments the adopters of machine learning versus the other companies that are Legacy more ligards in that field. It's about a 50-50 split And the ones who are the adopters tend to have three things in common They're facing problems with talent gap. Everybody's reporting about that They have competing priorities. Do they invest in AI versus do they invest in better security things like that? And yes, as I mentioned security breaches put a lot of pressure on how do you leverage? your your data sources a fourth thing that's coming up for the adopters is that they are seeing the first movers get advantage leveraging their horizontal business units such as Amazon To erode the verticals that other businesses have so the the adopters of AI now are facing a lot of challenges from Again Amazon Google etc that have kind of a first mover advantage and are moving out into into other areas Obviously Amazon versus retail is a pretty good example of this But the adopters in AI have an advantage in that they tend to have a lot of people who know What's going on in their domain? They're they're very good with domain expertise And so if they can leverage mechanisms such as human in the loop to combine teams of people in machines To combine that domain expertise into the loop with automation They may have some advantage over the first movers over the unicorns Typically companies like Google and Amazon they like to go after products that have more automation as opposed to less So human in the loop is an area for the early adopters to really get some advantage and and put their people to good work in The bottom row though for the legacy companies the things that they share Typically they have trouble recognizing business use cases for AI even if it's in front of them They would not necessarily recognize it. They also have trouble that most of them are buried in tech debt And you know the estimates are that a lot of enterprise Really you have to get your data silos knocked down. You have to get data cleaning pipelines in process You have to get sophisticated data science practices in place if you don't have those things Those are table stakes before you can get into AI and for a lot of companies Even if they started today, it would be another two or three years before they would have that Fundamental infrastructure in place before they could start doing machine learning and so that's a problem being buried in tech Tech debt is a very much a corporate risk right now But at the same time in the legacy group, they also have assets they tend to have a lot of data exhaust that could be monetized If they could understand where to use it And for instance Google Maps is a great example of that where there was a telecom company that had data They weren't monetizing and Google Maps was able to to help Produce you know estimates of how to get from point A to point B better with it All right, so Getting into a little bit more of a summary. What kind of changes in in company culture would be most needed to allow for better Adoption of AI we wanted to try to understand some of this As maybe more positive takeaways After the analysis of the survey There's a great talk. This is from Jacob Ward. He used to be the the senior editor for popular science He also has done work with other television documentary work, I believe with Al Jazeera and and others The thing is that he spent decades talking with a lot of scientists and I'm linking here to Interview that he did with Ben Lorica recently. It's a podcast. It's really fascinating but also it links into some Some video of keynotes that he's done at the AI conference and The point that Jacob Ward made was that on the one hand he was talking with scientists who said they were working with biologically inspired models of machine learning Having automation make decisions based off of how people make decisions and There was fascinating work going on in that field But at the same time he was talking with a lot of scientists who were looking into prospect theory and behavioral Economics and cognitive bias and they were saying, you know what as a whole humans make terrible decisions And here are a lot of cases where we see failures over and over it really wouldn't be a good idea to make machines That are patterned entirely after people And so Jacob was sort of like wait wait we have to stop here I mean there's definitely there's a conflict between what the different areas of sciences are talking about and in a Lot of ways social science now is well positioned to inform what's going on in computer science Maybe a flip side of this I highly recommend this is a fantastic Brief video. It's a keynote at the Velocity conference by omuja Miller She's from get hub one of the machine learning leaders at get hub and she had a kind of thought exercise If you take the org chart for a large organization and you rotate it 90 degrees You get something that looks very much like the architecture for deep learning It looks very much like how neural networks work in a lot of deep layers and you see things like connection pooling You see activation layers you see things such as even human analogs of backprop in Successful organizations you see that kind of information flow coming from the bottom up and from the top down and the feedback loops in between So she was exploring the idea of what is it that enterprise organizations can learn from neural networks to make their teams work? better together really great talk another fascinating talk is from Cassie Kosarov from Google and she's the chief decision scientist out of Google cloud She's been doing some fantastic lectures lately About how it is that businesses fail at machine learning even though there's a lot of tremendous potential for deploying machine learning models Categorically, how is it that they see businesses failing at this and what are lessons that Google has learned over the years doing that and What she's pointing toward in Cassie's talks if I could summarize it was It's not as much about the machine learning as it is about the decision How do organizations make decisions together collaboratively? it's going to be partly through humans partly through machines and When you can start to bring that into an organization you can really make progress as opposed to the kind of warnings that Jacob Ward was talking about If you haven't read this book, I highly recommend I think that for the near-term future of AI This is probably one of the best books available Daniel Kahneman also is about work that he did with Amos Tversky who unfortunately passed away But this was the 2002 Nobel Prize winner in economics It's called thinking fast and slow and it's about different modes of cognition that people have The book talks about system one versus system two. There's a kind of immediate response that people have Much like my response when I was getting near that scorpion a little bit earlier in the talk And that has to do with fighter flight. It has to do with more autonomic responses that people have gut feelings That's system one. There's also system two where people take time to think about things But that requires a lot of energy. We have brains. We burn a lot of carbs to run those brains and System two is something that we don't do is our first foot toward because you know If you walk into a room and there's a bunch of snakes, you don't want to take three or four minutes to say Well, what kind of snake is that that's interesting. You really you just want to run the other way or if there's a bunch of scorpions So this is a really fantastic book because it breaks down some of the problems that people have with cognition especially in organizations and These are some of the very same problems that we're encountering now as organizations in enterprise rollout machine learning Recently there was an effort by the World Economic Forum To prepare the AI gender for the Davos conference coming up next year And I I'm grateful to get invited to participate in a workshop to help put part of this together We came up with an AI ethics toolkit which is being targeted at the board level because The point is that I think we know now how to train people to do work with machine learning at an individual contributor level We have some really great understandings of how to move forward on that We're beginning to learn how to talk about leading teams that do AI so at a manager level Okay, we're getting some understandings about this But at the executive level and more importantly at the board level and the interface between board and executive that kind of tension there There is not a lot of understanding of how to grapple with the hard problems of AI And of course we've seen this in industry with a lot of companies struggling with data breaches and ethics problems This is really where we have to target for the next phase of adoption of AI and enterprise So at Davos there will be a launch of a toolkit targeting board members And the thing is I mean I I've been on the board of directors of two publicly traded technology firms At this point my beard is getting a bit more white A lot your typical kind of board members are going to be probably somebody more my age And so they will have grown up on things like Six Sigma They will have known to say the word agile over and over they will have been taught that uncertainty is a very bad thing and That automation is about creating a deterministic process to solve something that's very well known and and specified But yet as we're introducing AI all of those assumptions have changed and now we're talking about systems that are inherently probabilistic they're stochastic in nature and They're actually doing part of the judgment that the chief executives and and the board members would have been doing before So board members are at this really weird perilous point where what they knew and assumed is wrong and Meanwhile the competitions coming after them with AI Something that they don't quite understand and they know that there's a lot of risks So again Davos is going to try to push this out and others as well are looking at trying to really target How is it that the top-level executives handle that kind of cognition? Because it has to come both bottom up and top down And I think that a big summary for this is at this point almost every company is a technology company And with that almost every company is a data company And if you're not a guarantee somewhere out there your competition is so you have to be thinking this way and When we from an executive level when people look at problems and how to assign teams to To confront challenges in business We have to think of a baseline of having teams of people plus machines in some use cases It'll be more people some it'll be more machines, but we really have to think about always applying that as our baseline And so if there is any indication of where this is heading I'm co-chair for Jupiter con and we had a really great Jupiter con this year We had a lot more enterprise participation in in project Jupiter than we had expected and one of the things that came out of that was we saw It's within the highly regulated environments where there's a whole lot of evolution going on rapidly for open source so as a case in point and be gallery is a really great way of Providing search and discovery for large enterprise teams using Jupiter notebooks to be able to share insights but with strong privacy guarantees and This was created inside the United States intelligence community and then they pushed it public on get hub and I I talked with some folks about this they were interested in you know would other people be able to use this software and Frankly just about any bank needs this software and it's it's free courtesy of US tax dollars But you know we see this in banking we see this in health care We see this in other areas where the enterprise organizations that are facing so many risks so many challenges Because of compliance and security and ethics and the rest they're the ones who are starting to come out with solutions It's happening in open source. If you'd asked me that two years ago I would have said that would be impossible, but it is And so if we're not talking about agile for machine learning, what are we talking about? What what could we call it? I don't know that there's a software process name yet, but the one word that keeps coming up has to do with reproducibility and So the the one thing that seems like an agile manifesto Perhaps the closest to that that I've seen so far with respect to data analytics is this report here called 10 simple rules for reproducible research in Jupiter notebooks and This includes my friend Fernanda Perez who you know the Co-founder of Jupiter also Peter Rose out of UCSD a number of other folks who've been involved in setting up large bioinformatics Pipelines in universities where they had to have a lot of reproducibility And the takeaway is that there's a lot that Science at this point needs to learn from big data open source machine learning and yet There's a lot that data analytics needs to learn from science. So really it's a two-way street So for example in enterprise in data science teams You need to have that kind of scientific reproducibility and rigor so that different teams will be able to come up with the same results That that's one failure case that I've seen repeatedly in my career where you know a Company was struggling because every time they had 4% lift They'd have five different product managers claiming to have to you know half of it and the math just didn't work so The thing about reproducibility pipeline and versioning these are key components of how the larger scope of deploying machine learning And this paper here goes through 10 points Describing really what are the right processes for teams to use when they're doing that? And with that I'm eager to hear some questions if you want to get a hold of me Here's contact and I look forward to talking to lots of people at Big Data Spain. Thank you