 Hey everyone, welcome back to theCUBE's live coverage of Women in Data Science Worldwide Conference WIDS 2022. I'm Lisa Martin coming to you from Stanford University at the Ariyaga Alumni Center, and I'm pleased to welcome my next guest. Hannah Sperling joins me, Business Process Intelligence or BPI, Academic and Research Alliances at SAP. Hannah, welcome to the program. Hi, thank you so much for having me. So you just flew in from Germany. I did last week, yeah, long way away. Very excited to be here. But before we get started, I would like to say that I feel very fortunate to be able to be here and that my heart and wishes still goes out to people that might be in more difficult situations right now. I agree. It's such a, it's one of my favorite things about WIDS is the community that it's grown into. There's gonna be about 100,000 people that will be involved annually in WIDS, but you walk into the Ariyaga Alumni Center and you feel this energy from all the women here, from what Margo and team started seven years ago to what it has become. I was, it happened to be able to be listening to one of the panels this morning. They were talking about something that's just so important for everyone to hear, not just women, the importance of mentors and sponsors. And being able to kind of build your own personal board of directors. Talk to me about some of the mentors that you've had in the past and some of the ones that you have at SAP now. Yeah, thank you. That's actually a great starting point to maybe talk a bit about how I got involved in tech. So SAP is a global software company, but I actually studied business and I was hired directly from university around four years ago and that was to join SAP's analytics department. And I've always had a weird thing for databases even when I was in my undergrad, I did enjoy working with data and so working in analytics with those teams and some people mentoring me, I got into database modeling and eventually ventured even further into development, was working in analytics development for a couple of years and yeah, still am with a global software provider now which brought me to women in data science because now I'm also involved in research again. Okay. Yeah, some reason couldn't get enough of that. So maybe learn about the stuff that I didn't do in my undergrad and post-grad now researching at university and yeah, one big part in at least European data science efforts is the topic of sensitive data and data privacy considerations. And this is also a topic very close to my heart because you can only manage what you measure, right? But if everybody is afraid to touch certain pieces of sensitive data, I think we might not get to where we wanna be as fast as we possibly could be. And so I've been really getting into data anonymization procedures because I think if we could render more workforce data usable, especially when it comes to increasing diversity in STEM or in technology jobs, we should really be letting the data speak. Yeah. Letting the data speak, I like that. One of the things they were talking about this morning was the bias in data, the challenges that presents and I've had some interesting conversations on theCUBE today about data in healthcare, data in transportation equity. Yeah. Where do you think, if we think of International Women's Day, which is tomorrow, the breaking the bias is the theme. Where do you think we are from your perspective on breaking the bias that's across all these different data sets? Right. So I guess as somebody working with data on a daily basis, I'm sometimes amazed at how many people still seem to think that data can be unbiased. And this is actually touched upon also in the first keynote that I very much enjoyed, talking about human-centered data science. People that believe that you can take the human factor out of any effort related to analysis are definitely on the wrong path. So I feel like the sooner that we realize that we need to take into account certain biases that will definitely be there because data is humanly generated, the closer we're gonna get to something that represents reality better and might help us to change reality for the better as well because we don't wanna stick with the status quo. And anytime you look at data, it's definitely gonna be a backward-looking effort. So I think the first step is to be aware of that and not to strive for complete objectivity but understanding and coming to terms with a fact just as it was mentioned in the equity panel that that is logically impossible. Right, that's an important, you bring up a really important point. It's important to understand that that is not possible but what can we work with? What is possible? What can we get to? Where do you think we are on the journey of being able to get there? I think that initiatives like WIDs are playing an important role in making that better and increasing that awareness. There is a big trend around explainability, interpretability, and AI that you see not just in Europe but worldwide because I think the awareness around those topics is increasing and that will then also show you the blind spots that you may still have no matter how much you think about the context. One thing that we still need to get a lot better at though is including everybody in these types of projects because otherwise you're always going to have a certain selection bias in terms of perspectives that you're getting in. Right, that thought diversity, there's so much value in thought diversity, that's something that I think I first started talking about thought diversity at a WIDs conference a few years ago and really understanding the impact there that that can make to every industry. Totally, and I love this example of, I think it was a soap dispenser, one of these really early examples of how technology if you don't watch out for these human-centered considerations, how technology can go wrong and just perpetuate bias. So a soap dispenser that would only recognize the hand whether it was a certain light skin type that would be placed underneath it. So it's simple examples like that that I think beautifully illustrate what we need to watch out for when we design automatic decision aids, for example, because anywhere where you don't have a human checking what's ultimately decided upon, you might end up with much more, yeah, grave examples. Right, no it's, I agree, Cecilia Aragon gave the talk this morning on the human-centered AI. Exactly. I was able to interview her a couple of weeks ago for WIDs and a very inspiring woman in and of herself. But she brought up a great point about it's the humans and the AI working together. You can't ditch the humans completely to your point. There are things that will go wrong. I think that sends a good message that it's not gonna be AI taking jobs, but we have to have those two components working better together. Yeah, and maybe to also refer to the panel discussion we heard on equity. I very much like Professor Bull's point and how she emphasized that we're never gonna get to this perfectly objective state. And then also during that panel, a data scientist said that 80% of her work is still cleaning the data most likely because I feel sometimes there is this almost mysticism around the role of a data scientist that sounds really catchy and cool, but there's so many different aspects of working data science that I feel it's hard to put that all in a nutshell, narrow down to one role. I think in the end if you enjoy working with data and maybe you can even combine that with a certain domain that you're particularly interested in, be it sustainability or urban planning, whatever, that is the perfect match. It is and having that passion that goes along with that also can be very impactful. So you loved data, you talked about that, you said you had a strange love for databases. Where do you wanna go from where you are now? How much more deeply are you gonna dive into the world of data? That's a good question. Cause I would at this point definitely not consider myself a data scientist, but I feel like taking baby steps, I'm maybe on a path to becoming one in the future. And so being at university again gives me the opportunity to dive back into certain courses and I've done smaller data science projects and I was actually amazed at, and this was touched on in a panel as well earlier, how outdated so many really frequently used data sets are in the realm of research, AI, machine learning research, all these models that you feed with these super outdated data sets. And that's happened to me, like something I can relate to. And then when you go down that path, you come back to the sort of data engineering path that I really enjoy. So I could see myself keeping on working on that, the whole data privacy and analytics, both topics that are very close to my heart and I think can be combined. They're not opposites. That is something I would definitely stay true to. Data privacy is a really interesting topic. We're seeing so many, GDPR was how many years into a few years old that is now. And we've got other countries and states within the United States. For example, there's California has CCPA, which will become CPRA next year. And it's expanding the definition of what private sensitive data is. So companies have to be sensitive to that, but it's a huge challenge to do so because there's so much potential that can come from the data. Yet we've got that personal aspect, that sensitive aspect that has to be aware of, otherwise there's huge fines. Totally. Where do you think we are with that in terms of kind of compliance? So I think in the past years, we've seen quite a few rather shocking examples in the United States, for instance, where personal data was used or proxies that led to detrimental outcomes. In Europe, thanks to the strong data regulations, I think we haven't had as many problems. But here the question remains, well, where do you draw the line? And how do you design this trade-off in between increasing efficiency and making business applications better? For example, in the case of SAP, while protecting the individual privacy rights of people. So I guess in one way, SAP has an easier position because we deal with business data. So anybody who doesn't wanna care about the human element maybe would like to try building models and machine-generated data first. I mean, at least I would feel much more comfortable because as soon as you look at personally identifiable data, you really need to watch out. There is, however, ways to make that happen. And I was touching upon these anonymization techniques that I think are going to be more and more important in the coming years. There is a proposal on the way by the European Commission and I was actually impressed by the sophisticatedness of legislation in that area. And the plan is for the future to tie the rules around the use of data science to the specific objectives of the project. And I think that's the only way to go because if the data's out there, it's gonna be used. We've sort of learned that. And true anonymization might not even be possible because of the amount of data that's out there. So I think this approach of trying to limit the projects in terms of looking at what do they wanna achieve, not just for an individual company, but also for us as a society, think that needs to play a much bigger role in any data-related project. Where are, you said, getting true anonymization isn't really feasible. Where are we, though, on the anonymization pathway, if you will? I mean, it's always the cost-benefit trade-off, right? Because if the question is not interesting enough, so if you're not gonna allocate enough resources in trying to reverse engineer, I don't know, the tie to an individual, for example, sticking true to this anonymization example, nobody's gonna do it, right? We live in a world where there's data everywhere. So I feel like that's not gonna be our problem. And that is why this approach of trying to look at the objectives of a project come in. Because sometimes maybe we're just lucky that it's not valuable enough to figure out certain details about our personal life so that nobody will try, because I am sure that if people, data scientists, tried hard enough, I wonder if there's challenges they wouldn't be able to solve. And there has been companies that have put out data sets that were supposedly anonymized, and then it wasn't actually that hard to make inferences. And in the panel on equity, one last thought about that, we heard Jessica speak about construction and how she was trying to use synthetic data because it's so hard to get the real data and the challenge of getting the synthetic data to sort of mimic the true data. And the question came up of sensors in the household and so on. That is obviously a huge opportunity, but for me as somebody who is very sensitive when it comes to privacy considerations, straight away I'm like, but what if we generate all this data and then somebody uses it for the wrong reasons, which might not be better urban planning for all different communities, but simple profit maximization. So this is something that's also very dear to my heart and I'm definitely going to go down that path further. Well, Hannah, it's been great having you on the program. Congratulations on being a Woods Ambassador. I'm sure there's going to be a lot of great lessons and experiences that you'll take back to Germany from here. Thank you so much. We appreciate your time. For Hannah Sperling, I'm Lisa Martin. You're watching theCUBE's live coverage of Women in Data Science Conference 2022. Stick around, I'll be right back with my next guest.