 Live from San Jose in the heart of Silicon Valley. It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kubilis. We're joined by Stephanie McReynolds. She is the Vice President of Marketing at Alation. Thanks so much for returning to theCUBE, Stephanie. Thank you for having me again. So before the cameras were rolling, we were talking about Kevin Slaven's talk on the main stage this morning. And talking about, well, really the background is sort of this concern about AI and automation coming to take people's jobs. But really his overarching point was that we really, we shouldn't let the algorithms take over and that humans actually are an integral piece of this loop. So riff on that a little bit. Yeah, what I found fascinating about what he presented were actual examples where having a human in the loop of AI decision-making had a more positive impact than just letting the algorithms decide for you and turning it into kind of a black box. And the issue is not so much that there's very few cases where the algorithms make the wrong decision. What happens the majority of the time is that the algorithms actually can't be understood by humans. So if you have to roll back and do the decision-making or uncover it. I mean, who can grok what a convolutional neural network does at a layer by a layer, but nobody can. Right, right. And so his point was if we want to avoid not just poor outcomes, but also make sure that the robots don't take over the world, right? Which is wherever you like media person goes first, right? That you really need a human in the loop of this process. And a really interesting example he gave was what happened with the 2015 storm. And he talked about 16 different algorithms that do weather predictions. And only one algorithm predicted mis-predicted that there would be a huge weather storm on the East Coast. So if there had been a human in the loop, we wouldn't have caused all this crisis, right? And a human could easily see. And this is the storm that shut down the subway system and really canceled New York City for a few days there. That's right. So I find this pretty meaningful because Elation is in the data cataloging space. And we have a lot of opportunity to take technical metadata and automate the collection of technical and business metadata and do all this stuff behind the scenes. I mean, the discovery of it and the analysis. Leading the discovery of this and leading to actual recommendations to users of data that you could turn into automated analysis or automated recommendations. Algorithmically augmented human judgment is what it's all about. Really, I see it. What do you think? Yeah, but I think there's a deeper insight that he was sharing is it's not just human judgment that is required, but for humans to actually be in the loop of the analysis as it moves from stage to stage that we can try to influence or at least understand what's happening with that algorithm. And I think that's a really interesting point. There's a number of data cataloging vendors. Some analysts will say there's anywhere from 10 to 30 different vendors in the data cataloging space. And as vendors, we kind of have this debate. Some vendors have more advanced AI and machine learning capabilities and other vendors haven't automated at all. And I think that the answer, if you really want humans to adopt analytics and to be comfortable with the decision making of those algorithms, you need to have a human in the loop in the middle of that process of not only making the decision but actually managing the data that flows through these systems. Well, algorithmic transparency and accountability is an increasing requirement. It's a requirement for GDPR compliance, for example. That's right. But I don't see yet, in Wikibon, we don't see a lot of solution providers offering solutions to enable a more of an automated rollup of the narrative of an algorithmic decision path. But that clearly is a capability as it comes along and it will, that will absolutely depend on a big data catalog managing the data, the metadata, but also helping to manage the tracking of what models were used to drive what decision and what scenario. So that plays into what Elation and others in your space do. We call the data catalog almost as if the data is the only thing that we're tracking, but in addition to that metadata or the data itself, you also need to track the business semantics, how the business is using or applying that data and the algorithmic logic. So that might be logic that's just being used to transform that data or it might be logic to actually make an automated decision like what they're talking about in GDPR. It's a data artifact catalog. These are all artifacts that are derived in many ways or supplement or complement the data. They're all, it's all the logic. And what we talk about is, how do you create transparency into all those artifacts, right? So a catalog starts with this inventory that creates a foundation for transparency, but if you don't make those artifacts accessible to a business person who might not understand what is metadata, what is a transformation script, if you can't make those artifacts accessible to what I consider a real or normal human being, right? I love to geek out, but at some point not everyone can understand. I'm a normal human being. I'm the abnormal human being among the questioners here. Most people in the business are just giving their arms around how do we trust the output of analytics? How do we understand enough statistics to know what to apply to solve a business problem or not? And then we give them this like hairball of technical artifacts and say, oh, go at it, you know, here's your transparency. Well, I want to ask about that human that we're talking about that needs to be in the loop at every stage, that surely we can make the data more accessible, but it also requires a specialized skill set. And I want to ask you about the talent because I noticed on your LinkedIn, you said, hey, we're hiring, so let me know. That's right, we're always hiring, we're startups. Growing well. So I want to know from you, I mean, are you having difficulty with billing roles? I mean, what is the pipeline here? Are people getting the skills they need? Yeah, I mean, there's a wide, what I think is the misnomer is it's actually a wide variety of skills. And I think we're adding new positions to this pool of skills. So I think what we're starting to see is an expectation that true business people, if you're in a finance organization or you're in a marketing organization or you're in a sales organization, you're going to see a higher level of data literacy be expected of that business person. And that doesn't mean that they have to go take a Python course and learn how to be a data scientist. It means that they have to understand statistics enough to realize what the output of an algorithm is and how they should be able to apply that. So we have some great customers who have formally kicked off internal training programs that are data literacy programs. Munich Reinsurance is a good example they spoke with James a couple of months ago in Berlin. Yeah, this conference in Berlin, yeah. That's right, that's right. And their chief data officer has kicked off a formal data literacy training program for their employees so that they can get business people comfortable enough and trusting the data. It's a business culture transformation initiative that's very impressive. Yeah. How serious they are and how comprehensive they are. But I think we're going to see that become much more common. Pfizer has taken, who's another customer of ours, has taken on a similar initiative. And how do they make all of their employees be able to have access to data but then also know when to apply it to particular decision-making use cases. And so we're seeing this need for business people to get a little bit of training and then for new roles like information stewards or data stewards to come online, folks who can curate the data and the data assets and help the kind of translators in the organization. Stephanie, would there be a need for an algorithm curator or a model curator to, you know, like a model whisperer to explain how these AI convolutional current, whatever all these neural networks, how, what they actually do. You know, would there be a need for that going forward? Another is a normal human being who can somehow be bilingual in neural net and in standard language. I think so. I mean, I think we put this pressure on data scientists to be that person. Oh my gosh, they're so busy doing their job. How can we expect them to explain it? I mean, spend a hundred spent of their time explaining it to the rest of us. And this is the challenge with some of the regulations like GDPR. We aren't set up yet as organizations to accommodate this complexity of understanding. And I think that this part of the market is going to move very quickly. So as vendors, one of the things that we can do is continue to help by building out applications that make it easy for information stewardship. How do you lower the barrier for these specialist roles and make it easy for them to do their job by using AI and machine learning where appropriate to help scale the manual work, but keeping a human in the loop to certify that data asset or to add additional explanation and then taking their work and using AI and machine learning and automation to propagate that workout throughout the organization so that everyone then has access to those explanations. So you're no longer requiring the data scientists to hold like, I know other organizations that hold office hours and the data scientists like sit at a desk like you did in college and people can come in and ask them questions about neural nets and that's just not going to scale at today's pace of business. The term that I introduced just now, the algorithm or model whisperer, the recommender function that is built into your environment in similar data catalog is a key piece of infrastructure to rank, to relevance rank, the outputs of the catalog are responses to queries that human beings might make. The recommendation ranking is critically important to help human beings assess what's going on in the system and give them some advice about how to, what avenues to explore, I think. Yeah, and that's part of our definition of data catalog. It's not just this inventory of technical metadata. That would be boring and dry and useless for most human beings. That's where a lot of vendor solutions start, right? And that's an important foundation. Yeah, for people who don't live 100% of their work day inside the big data catalog, I hear what you're saying. Yeah, so people who want a data catalog, how you make that relevant to the business is you connect those technical assets, that technical metadata, with how is the business actually using this in practice and how can we have proactive recommendations with recommendation engines and certifications and this information steward then communicating through this platform to others in the organization about how do you interpret this data and how do you use it to actually make business decisions. And I think that's how we're going to close the gap between technology adoption and actual data-driven decision-making, which we're not quite seeing yet. We're only seeing when they survey only about 36% of companies are actually confident they're making data-driven decisions, even though there have been millions, if not billions of dollars, that have gone into the data analytics market and investments. And it's because as a manager, I don't quite have the data literacy yet and I don't quite have the transparency across the rest of the organization to close that trust gap on analytics. Here's my feeling in terms of cultural transformations across businesses in general. I think the legal staff of every company is going to need to get real savvy on using those kinds of tools like your catalog with recommendation engines to support e-discovery or discovery of the algorithmic decision pass that were taken by their company's products because they're going to be called by judges and juries under subpoena and so forth and so on to explain all this. And they're human beings who've got law degrees but who don't know data. They need the data environment to help them frame up a case for what we did and so we, meaning the company that's involved, you know? Politicians, I mean, anyone who's read Kathy's book, Weapons of Math Destruction, there are some great use cases of where- Math, M-A-T-H. Yes, M-A-T-H. But there's some great examples of where algorithms can go wrong and many of our politicians and our representatives in government aren't quite ready to have that conversation. I think anyone who watched the Zuckerberg hearings, in Congress, saw the gap of knowledge that exists between the legal community and the tech community today. So there's a lot of work to be done to get ready for this new future. But just getting back to the cultural transformation needed to make data-driven decisions, one of the things you were talking about is getting the managers to trust the data and we're hearing about what are the best practices to have that happen in the sense of starting small, be willing to experiment, get out of the lab, try to get to insight right away. What would your best advice be to gain trust in the data? Yeah, I think the biggest gap is this issue of transparency. How do you make sure that everyone understands each step of the process and has access to be able to dig into that? If you have a foundation of transparency, it's a lot easier to trust. Rather than, you know, right now we have kind of like the high priesthood of analytics going on, right? And some believers will believe but a lot of folks won't. And you know, the origin story of Elation is really about taking these concepts of the scientific revolution and scientific process and how can we support for data analysis those same steps of scientific evaluation of a finding? That means that you need to publish your data set. You need to allow others to rework that data and come up with their own findings. And you have to be open and foster conversations around data in your organization. One other customer of ours, Myer, who's a grocery store in the Midwest. And if you're West Coast or East Coast based, you might not have heard of them. But Myer is 50 acres. I'm from Michigan. I know them. Yeah, there you go. Gigantic grocery chain in the Midwest. And Joe Oppenheimer there actually introduced a program that he calls the social contract for analytics. And before anyone gets their license to use Tableau or MicroStrategy or SAS or any of the tools internally, he asks those individuals to sign a social contract which basically says that I'll make my work transparent. I will document what I'm doing so that it's shareable. I'll use certain standards and how I format the data so that if I come up with a really insightful finding, it can be easily put into production throughout the rest of the organization. So this is a really simple example. His inspiration for that social contract was his high school freshman who was entering high school and had to sign a social contract that he wouldn't make fun of the teachers or the students. So you know, very simple basics. Yeah, I agree. I wouldn't make fun of the teachers. We all make fun of the contract. Oh my gosh, you have to make fun of the teacher. I think he was a little more formal than that in the language. That was the concept. That's violating your civil rights as a student, I'm sorry. Stephanie, it's always so much fun to have you here. Thank you so much for coming on. Thank you, it's a pleasure to be here. I'm Rebecca Knight for James Kobielus. We will have more of theCUBE's live coverage of DataWorks just after this.