 from San Jose. It's theCUBE, presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. Welcome back to theCUBE. We are live in San Jose at our event, Big Data SB. I'm Lisa Martin, and we are down the street from the Strada Data Conference. We've had a great day so far talking with a lot of folks from different companies that are all involved in the Big Data unraveling process. I'm excited to welcome back to theCUBE one of our distinguished alumna, Marybeth Lopez, the founder and principal analyst at Lopez Research. Welcome back to theCUBE. Thank you, I'm excited to be here. Yeah, so you've been up, Strada Conference started a couple of days ago. What are some of the trends and things that you're hearing that are really kind of top of mind for not just the customers that are attending, but the companies that are creating or trying to create solutions around this Big Data challenge and opportunity? Yeah, absolutely. I mean, I think we talked a lot about data in the years past. How do you gather the data? How do you store the data? How you might want to process the data? This year seems to be all about how do I make something interesting happen with the data? How do I make an intelligent insight? How do I cure prostate cancer? How do I make sure I can classify images? It's a really different show and we've also changed some of the terminology. A lot more on machine learning now and artificial intelligence and frankly a lot of discussion around ethics. So it's been very interesting. Data ethics, how do we do privacy? How do we maintain the right level of data so that we don't have bias in our data? How do we get diversity, inclusion going? Lots of really interesting, powerful human topics, not just about the data. I love that, the human topics, especially where AI and ML come into play. You talked about data diversity or bias in there. We were just at the Women in Data Science Conference a couple of days ago talking to a lot of female leaders in data science, computer science, both in academia as well as in industry. And one of the interesting topics about the gender disparity is the fact that that is limiting the analyses on data in terms of there may be a few perspectives looking at it so there's inherent bias there. So that's one issue and I'd love to get your thoughts on that. Another is with that lack of thought diversity, I guess I would say going into analyzing the data, companies might be potentially limiting themselves on the types of products that they can create, how to monetize the data and actually drive new revenue streams. On the kind of thought diversity, we'll start there. What are some of the things that you're hearing and what are some of your recommendations for your clients on how to get some of that bias out of data analysis? Yes, it's interesting. One is trying to find multiple sources of data. So there's data that you have and that you own but there's a wide range of openly available data now. There's some challenges around making sure that that data is clean before you integrate it with your data but basically diversifying your data sources with third-party data is one big thing that we're talking about. In previous analytical generations, I think we talked a lot about you had to have a hypothesis and you were trying to prove a hypothesis and now I think we're trying to be a little more open and looser and not really lead the data anywhere per se but try to find the right patterns and correlations in the data and then just awareness in general. Like we don't believe we're biased but if we have data that's biased it gets put into the system so we have to really be thoughtful about what we put into the system. So I think that those three things combined have really changed the way people are looking at it and there's a lot of awareness now around that because we assume at some point the machines might be making certain decisions for us and we want to make sure that they have the best information to do that and that they don't limit our opportunities as a society. Where are companies in terms of the clients that you see culturally in terms of embracing the openness? Because you're right, from a scientific method perspective people go into, I'm going to hypothesize this because I think I'm going to find this and maybe wanting the data to say this where are companies, we'll say enterprises in becoming culturally more open to not leading the data somewhere and bringing that bias in? Well, there are two interesting things here, right? I think there's people that have gone down the data route for a while now, sort of the industry leading companies they're in this mindset now of trying to make sure they don't lead the data they don't create biases in the data they have ways to explain how the data and the analysis and the learning came about not just for regulation but so that they can make sure they've ethically done the right thing but then I think there's the other 95% of companies that they're not even there yet they don't know that this is a problem yet so they're still dealing with the I've got to pull in the data I've got to do something with it they don't even know what they want to do with it let alone if it's bias or not so we're not quite at the leading the witness point there with a lot of organizations. That's something that you expect to see maybe down the road. I'm hoping we'll get ahead of it I'm really hoping that we'll get ahead of it It's a good positive outlook on it I think that I think because the real analysis of the data problem in a big machine learning deep learning way is so new and that people are actually out seeking guidance that there's an opportunity to get ahead of it the second thing that's happening is people don't have data scientists, right? So they don't necessarily have the people that can code this so what they're doing now is they're depending on the vendor landscape to provide them with an entry level set of tools so if you're Microsoft, if you're Google, if you're Amazon you're trying very hard to make sure that you're giving tools that have the right ethics in them and that can help kickstart people's machine learning efforts so I think that's going to be a real win for us and we talked a lot today at the Stratoconference about how, oh you don't have enough images you can't do that or oh you don't have enough data you can't do that or you don't have enough data scientists and some of what came back is that some of the best and the brightest have coded some things that you can start to use to kickstart that will get you to a better place than you ever could have started with yourself so that was pretty exciting transfer learning as an example of taking ImageNet from Google and some algorithms and using those to take your images and try to figure out if somebody has Alzheimer's or not and code things Alzheimer's are not characteristic so very cool stuff, very exciting and nice to see that we've got some minds working on this for us Yeah, definitely where you're meeting with clients that don't have a data scientist or chief analytics officer sounds like a lot of the technologies need to or some have built in sort of enablement for a different data citizen within a company if you're talking to clients that don't have a data scientist or data science team who are your constituents there? Where are companies that don't maybe have that skill gap? Who do they go to in their organization to start evaluating the data that they have to get to know it and start to understand what their potential is? Yeah, there's a couple places people go they go to their business decision analytics people so the people that were working with their BI dashboards for example the second place they go is to the cloud computing guys because they're hearing a lot about cloud computing and maybe I could buy some of this stuff in the cloud I'm just going to roll up and get all my machine learning in the cloud so we're not there yet so the biggest thing that I talk to people about right now is what are the realities around machine learning and AI? We've made tremendous progress but you read the newspaper and something's going to get rid of your job and AI's are going to take over the world and we're kind of far from that reality first of all it's very dystopian and negative but even if it weren't that what you can do today is not that so there's a lot of stages in between so the first thing is just trying to get people comfortable with no you can't just buy one product and throw in some data and you've got everything you need right we're not there yet but we're getting closer you can add some components you can get some new information you could do some new correlations so just getting a reality and grounding of where we are and that we have a lot of opportunity and that it's moving very fast that's the other thing IT leaders are used to all evaluate it once a year all evaluate it once every couple of years these things are moving in monthly increments really huge changes in product categories so you kind of have to keep on top of it to make sure you know what's available to you right and if they don't they miss out on not only the ability to monetize data streams but potentially going out of business because somebody will come in maybe more nimble agile and be able to do it faster yeah and we already saw this with the digital native companies that started born in the cloud companies we used to call them well now everybody can be using the cloud so the question then is like what's the next wave of that the next wave of that is around understanding how to use your data understanding how to get third party data in and being able to rapidly make decisions and change models based on that one of the things that's interesting about big data is you know it was a big buzzword and it seems to be becoming less of a buzzword now I mean Gartner even was saying I think the number was 85% of big data projects and I think that's more in test seven environments fail and I often say failure in a lot of cases is not a bad F word because it spawns genesis of new products new ideas, et cetera but when you're talking with clients who go all right we've embraced Hadoop we've got this big data lake now it's turning really swampy we don't know all this we've got lakes, we've got oceans, we've got ponds right what's the conversation there where you're helping a customer clean that swamp up get broader visibility across their data and enable different lines of business not just the BI folks or the cloud folks or your IT but marketing, logistics, sales what's that conversation like to clean up the swamp and do more enablement for visibility I think one of the things that we got really hung up on was you know creating a data ocean we're going to bring everything all in one place it's going to be this one massive big sword it's going to be awesome and that's just not the reality of the world so I think the first thing in the cleaning up that we have to do is being able to figure out what's the source of truth for any given data set that somebody needs so you see 15 sales people walk in and they all have different versions of the data that shouldn't happen so we need to get to the point where they know where the source of truth is for that data the second is sort of governance around the data we spend a lot of time dumping the data but not a lot of time in terms of getting governance around who can access it what they can do with it for how long they could have access to it is it just internal, is it internal and external so I think that's the second thing around like harassing and haranguing the swamps and the lakes and the ponds right and then assuming that you do that I think the other thing is if you have a hammer everything looks like a nail well in reality when you construct things you have nails, you have screws, you have bolts and picking the right tool for the job is something that the IT leadership has to work with and the only way that they get that right is to work very closely with the different lines of business so they can understand the problem because the business leader knows the problem they don't know the solution if you put them together which we've talked about forever frankly but now I think we're seeing more imperatives for those two to work closely together and sometimes it's even driven by security just to make sure that the data isn't leaking into other places or that it's secure and that they've met regulatory compliance so we're in a much better space than we were two, three, five years ago because we're thinking about the real problems now not just how do you collect it and how do you store it but how do we actually make it an actionable manageable set of solutions? Exactly and make it work for the business Well Maryville, I wish we had more time but thank you so much for stopping by theCUBE sharing the insights that you've seen not just at Stata Conference but also with your clients Thank you We want to thank you for watching theCUBE again I'm Lisa Martin live from Big Data SV in downtown San Jose get involved in the conversation hashtag Big Data SV come see us at the forager eatery and tasting room and I'll be right back with our next guest