 Good morning. Just a very quick two-minute call, two minutes, and we'll start with the opening video. So please find yourself a seat. There's room still here up front. There's some lonely men here up front. They need company. So come and join them. Any data inspired and data driven science is so critical right now. More and more decisions are made based on data. The amount of data that we gather every day and the insights that the data can provide us is just growing exponentially and that is no exaggeration. The market for data science and related areas like AI is booming. It is so important to have women and artificial intelligence in the area of data science and also in leadership roles. It's being able to use data to solve issues and understand bigger problems. It's critical. And we need women in these roles. Every individual brings their own perspective and so we need to make sure the entire workforce is represented. And the good news is there's so many jobs and many different ways to combine their passion area and their skills in data science and get involved. I would like you to say what are the problems in the world that absolutely have to be changed? And can you individually, given all the amazing background that you've had so far and all the education that you've got so far, what are the unique things that you can do to change the world towards that mission? And then think of the technology. If that is going to become completely data driven over time, then you can't miss that opportunity. You've got to join in and have your say. If you're not looking at the data from all sorts of different angles, then you could introduce a lot of bias. So it's really, really important that we have around the table all genders, all races, all backgrounds. We can't ignore social structure problems. We can't just go in a corner and write some code and read math and then that's a solution, right? Can't do that. So we have to think about who is being affected by these algorithms. Welcome to WINS! When we first started this conference, we never would have imagined that we'd be sitting here today with over 200 regional events. We've got over 500 WINS ambassadors worldwide. Many of them are women, but we've also got a lot of men. And these are people who are just passionate about inspiring others within their community. We're in over 60 countries and year after year we're blown away. Let's make this next decade the women in the data science decade. What I love about it is that that growth is viral, that people will attend one event in one city and then they'll want to bring it to their cohort or colleagues the next year. This type of industry can be done everywhere, so it should be accessible to everybody. And this is one of the reasons why I love that we are global with WINS. So we wanted to create opportunities for women to inspire, educate and support women at many different times throughout the year. And one way that we decided we could do that was through a data thought every year, which is a predictive analytics challenge using real-world data. We have over 900 teams from 85 countries and that's in every continent except Antarctica. When we started WINS in 2015 we had no idea this was going to be a global movement with tons of international events and a data thong and a podcast show and now outreach to middle school and high school has just been such a ride. Our latest endeavor is to work on some materials that we can hand off to teachers in schools around the world. This has provided a platform for literally hundreds of women, if not thousands of women, to have an opportunity to be heard. But the truth is these are really simple experiments, but they have profound impact because they empower someone else to be able to do their job better or to be able to take that message. Five years ago when we were sitting around a coffee table thinking about what WINS could be, I never in my wildest dreams thought it would grow so far and so wide around the world in just five years. What I'm most excited about is the next five years because I think this is really just the start. Welcome! We're so happy to be here today at WINS around the world and a lot of people are looking at the live stream as we speak. We're not just talking to you here in the audience, we're talking to a global audience and that's amazing. I'm Margo Garrison and I'm here with Karen Methods and Judy Logan. We are the co-directors of the Global WINS and this is a conference where we promote and we look at and we admire outstanding women doing outstanding work in the broad area of data science and we're here to support, inspire and educate everybody regardless of where you're from and regardless of gender for sure. It just so happens all the speakers today are women and not just here but everywhere in the world. And last year if you were here we tried this last year but we're going to try it again this year so here it goes. Welcome to the Global Women in Data Science 2020! And as you heard in the video we are so excited this year because we have more events than ever. We have 200 regional events worldwide being planned in 60 countries and these events are being planned by over 500 WINS ambassadors worldwide. People who are so passionate about data science and about sharing their knowledge with others that they host these regional events. Some of these events have actually already taken place and other events are happening in the coming months and we're also really delighted because today we're welcoming several of our WINS ambassadors here today at Stanford in the audience. Yes, we'd actually like to celebrate all of you here today. We have an amazingly diverse group in the room. 30 universities are represented amongst all these tables and over 90 organizations including corporations, national laboratories, NGOs and beyond. We know a lot of people came a long way and thank you for making the time to come here. We always like to give a shout out to the person who comes the farthest. I think this year it's Sulekshwana Sriram from Hyderabad, India so thank you. And then we also have a shout out to our youngest members today. We have Sam Deener in high school and John V. Subramanian in high school so we have wide range of ages in the room as well. In terms of speakers, our speakers are a very diverse wonderful group as well. We have speakers on stage today from 11 different countries. Yeah and and today is really quite special because as you heard in the opening video this is our fifth anniversary. So we're five years older and we have a lot more wrinkles since we started. But it is unbelievable to think about this for five years. We've been doing this and now we're so worldwide and so many people are joining us and I know there's many of you who've been here from the very first WITZ conference but you know five year anniversary do you know how you celebrate that? You know what a gift is for five years? Wood. So I thought I'd give my wonderful fellow co-directors a wooden present. We get presents. So yeah you get presents. Now and I love these women to death but there's one thing about them. See they're they're a little short. They're a little shorter than I am so I thought what can I do with wood? So specifically for them. So this one is for Judy. My calculation is she was about four inches shorter than I today. You can stand on this one and you know Karen needs a little bit more help so you can stand on this one here Karen. So there's a good one. All right. This is much better. I love it. Thank you Margo. I feel very powered. Well but there is a more modern gift that is typically given during five years and that's silverware. Yes we thought Margo we give you a present as well and since the future is so bright for women in data science we thought how about a custom first edition pair of wits. Oh yes look at this. There you go. Oh wow look at this. We'll put them on to. This actually helps a lot because we have these big lights here on stage. This is really but I can't read anything now so that's the only problem that we have. So we are really interested or really excited to tell you today about many of the things that we do. We don't just have the conference anymore. We have a podcast series as well which I'm really fortunate to host and we also have a global datathon for wits as well as a new high school outreach program and you'll hear more about both of those later on today. But before we get started we definitely want to give a big round of applause and thank you to our wonderful sponsors our global visionaries and our innovation leaders and then finally our Stanford sponsors as well. We are incredibly grateful to all of them for the support encouragement ideas and everything they do for us to make this all happen. Yeah so thank you. Thank you all. All right so I'm looking at the time and you know I'm going to be your emcee for today and one of my greatest joys is to kick people off the podium. So I'm going to start with with Judy and Karen but if any of the speakers want a little step up today then you can you can borrow those from these these women. Hey a few quick things household items before we start with the program. First of all you have the program hopefully in front of you the booklet. Something really important in this program is the tweet handle for wits. So one thing I would like you to do is take out your smartphones and tweet something right now. Take a picture of your table and put it on Instagram and make a strength that's going to be the goal again for this year. The other thing I just wanted to quickly let you know that if there is an emergency and we do have to leave you just go out of these doors here and you're going to be fine so these these doors in the back go out of that. I also want to ask you that if you do leave through those doors halfway through a talk that you close the door really really quietly because if you just let it go then it makes a big bang and it can disturb the the speakers a little bit. One of the things you'll see here is we have many women in the audience so we have taken over the gents restrooms apart from one. So nearest us is all female all women's restrooms and then towards the end of the corridor for the gentleman amongst you you you will have to go there. Now all around the conference also we have hand sanitizers and we just want to ask you considering the coronavirus right now that you wash your hands frequently as well but I'm really happy that all of you are here and that nobody had to cancel at this conference. By the way this is a real advantage of having a distributed conference where everybody can just go to their own regional event and we don't have that much international traffic moving through we just connect with each other through the livestream. Now I mentioned men with the restroom I do want to call out the men in this audience please all stand up all the men come on stand up. Hi great and for some of them for some of them this may be the first time in their lives that they're really different. We are so used to this you go to a conference and you're surrounded by men in fact we started this conference because so many conference just had male speakers and at some point I was asked and I tell the story all the time to give a talk at the conference and I couldn't make it and then a few weeks later I see the program for the conference they're all male speakers so I talk to the organizer say what happened say well Margo you couldn't make it and say what about some other women oh we really looked but we couldn't find any and so when people ask me and some of you've heard this before why do you have only female speakers I said well Bob couldn't make it right and we really tried to find some male speakers but we just can't find any so one of the great things about WIDS is that over time in the last five years we've put on the stage probably 2,500 female speakers around the world so nobody has an excuse anymore but to start today at the conference it's my great pleasure to introduce Perce Israel one of my very favorite people on campus a favorite women on campus and what is so fun for me also is that she actually opened our very first women in data science conference at that time she was not yet the provost of Stanford University she was only the first female dean of the school of engineering here at Stanford University she has a fantastic career you can read about this in the program I'm not going to talk about everybody's amazing bios because it's all there for you to read and peruse at your leisure but I do want to mention one thing Perce started as undergrad a long time ago wanting to do mathematics and then you change to physics she could have been a real data scientist if you just stuck with math but I think she turned out okay so so please welcome Perce Israel thank you Margo okay um so uh hello everybody and welcome to Stanford we are as Margo said really thrilled to have you here and I have to say it is really just fantastic to look out in the room and see a room full of women I also have to think back to five years ago and Margo one of the differences I want to know if you agree there are actually more men in the audience now I think they were a little afraid five years ago maybe but anyway so I want to welcome the men in the audience too uh allies are great um this conference is I think a great indication as Margo has shared of how the interest in the field of data science has grown and grown among women uh when when I think back to that first conference in 2015 uh when I made those opening remarks I had no idea that Margo and her colleagues were starting a movement uh but clearly there was a hunger for it and it has been it is just awesome to see how it has grown in just five years WIDS has expanded from what was a local Stanford conference uh that was live streamed to a really global event with the 200 conferences around the world and of course the field of data science has also expanded in the last five years more and more decisions good and bad are made based on data analysis analysis that is good and bad and one of the challenges is separating those but things like precision medicine understanding and targeting retail customers much more successfully than I wish they did monitoring financial markets predicting the weather and elections of course the uh new insights available from big data have made data science an exciting and important field to be in right now and I'd say that's a that's another difference from five years from ago five years ago yeah data science now it's wow data science now all of this means that uh diversity in data science is actually more important now than it was even five years ago and I'd like to spend a few minutes talking about what I mean about that why is diversity important if you step back uh and try to address that issue it's really important to be able to say why and research and education in any field why is diversity important and we've spent a lot of time thinking about that at Stanford and discussing it I think we've made some strides at Stanford to become a bit more diverse diverse inclusive and accessible but we we need to do more and that's where answering that why question can be very powerful it is just very very important and I'm going to stress this for each of you to be able to clearly articulate why diversity and inclusion are critical to our research and education mission so I'm going to give you my version on this you will each have separate versions um the first reason for me is at Stanford promoting diversity ensures intellectual strength to solve complex problems discover break scientific breakthroughs achievements in the arts whatever it is you have to bring a broad range of ideas and approaches and to advance our educational mission it is essential to be exposed to views or cultures that are different from your own and have your opinions and assumptions challenged an institution or enterprise that reflects broad diversity will inspire new angles of inquiry new modes of analysis new discoveries new solutions so that's one reason diversity is important in my articulation the other is that the future is diverse our world is becoming increasingly complex more interconnected to be fully engaged citizens in the 21st century we need to embrace diversity in all aspects of life not just in the workplace not just on a campus we need to be able to navigate difference develop empathy value our engagement with diverse backgrounds and perspectives the challenges we face now and in the future transcend all borders all genders we must be sure that solutions we develop us address the needs of all people and incorporate input from all communities diverse teams are a critical path to embracing that diverse future and the time is right now for the field of data science because data analysis can be applied in so many different areas and in so many different ways diversity is critical and you're the expert you know how easy it is to take data and use it incorrectly because you started with a bias or you started with us with an adversely selected sample as the world becomes more data driven there will be increased opportunities and demand for data science scientists and we have to remove the barriers that exist for participation we can't afford to exclude talent so as you all know women can and do excel in data science despite what are all too common and very unfortunate beliefs by some that women and girls do not have what it takes in STEM subjects and that of course has been debunked in many studies but those prejudices still exist we need to expose and counteract those biases about girls and women and their abilities they are especially pernicious for young women growing up girls and we need to get them interested in science at an early age they need to see role models you've all heard you can't be what you can't see the overall number of women undergraduates in STEM subjects is increasing that's very encouraging but there are still large disparities for women entering these fields professionally and women leave their STEM based careers at a much higher rate than men they are often in male dominated workplaces they find themselves not fully integrated accepted promoted included that means we need to make improvements to the workplace all workplaces and I'm going to include here at Stanford as well to create more welcoming and equitable environments for women and we all need to continue to seek support from and provide support for each other so this conference showcases outstanding work by some remarkable women there are many role models to be found here today and at Woods nationwide and worldwide if I had a hope for this conference it will be that it continues to inspire more women to get involved in data science and that it continues to create networks of support to keep more women in the field so before I just wrap up I want to thank the organizers of women in data science Margo Karen and Judy please join me in thanking them and I'd like to thank the sponsors Stanford Data Science Institute ICME the Stanford president's office and the women in data sciences industrial partners so make the most of your time here today come away informed connected and inspired thank you thank you persons don't go just yet okay because I want to ask you a question sure you know you have been as the dean of the school of engineering and I was provost for quite some time now right at the forefront of data science here in the valley so before you go tell me what is your deepest fear and what is your fondest hope when it comes to data science is one of the themes we have at this conference so deepest fear is unfortunately easy to call up and that is so I'm old right you're all mostly younger than I am and so I I was a graduate student in the 80s and I remember that world and I remember what people could say they weren't bad people they didn't know what they were saying but there was a a discourse and a narrative and a conversation that was really not very supportive of women and did create what we would now call a hostile environment I mean I didn't particularly think it was hostile at the time because they didn't know any better but we've learned a lot since then and then I've watched progress where you can't control what somebody thinks but you can make it unacceptable to say certain things and that worked pretty well and it's made the environment more inclusive and more welcoming and I've seen more women coming into the field and in the last decade maybe even the last five years it has become acceptable to say things that I thought had been agreed upon as unacceptable decades ago and that I find deeply deeply disturbing so that's the greatest fear is that we can't put you know I want to put that genie back in the bottle where it belongs and and and I'm I'm I'm nervous I'm worried about what's happened in the in the dialogue in our country so that's the greatest fear greatest hope for data science greatest hope wow apart from of course that's 50% of the leaders yeah absolutely but that that will come and it'll probably solve the problems but um it's so I think that my greatest hope for data science is to develop the uh framework and a culture to keep data from being misused I think I mean I've got up on the wrong side of the bed this morning because both of those have a slight negative tinge to them but it's that right now I mean data and the use of data is so powerful and we learn so much from it but as I said earlier each of you knows how easy it is to misuse data so how do we develop the cultures and the frameworks uh so that data can be shared and used appropriately and positively and mitigate the negatives every new invention every new technology has good sides and bad sides it's the culture of the field that helps us use use it for good and um and contain its use for for bad and that's something that that this field uh that's a challenge you need to take on and my hope is you get it right fantastic thanks so much persons for coming and opening us my pleasure and I'll see you in five years in five years you bet I'll be here right thanks again first Israel Provost of Stanford University yeah great now before we continue uh I'd like to call out somebody in this audience who's already drawing uh feverishly and that's Liza Donnelly here Liza get up stand up Liza Donnelly uh she has been with us for four years she'll be live drawing the conference and she'll be Facebooking that and tweeting it and and later on today we'll show some of her work thanks so much again Liza for joining us and I also wanted to just welcome some of the people on the other side of the camera because right now I know for a fact that Austin is watching us that UC Berkeley is watching us and in fact there's several of them here from Berkeley and but also further afield so Mexico City is watching us bienvenidos Mexico City uh Mexico I should say uh Ciudad de Mexico and then Saskatoon is watching us as well so we have truly a global audience right now so it's now my pleasure to introduce to you our first keynote speaker for the day and she's standing there getting ready definitely please join me here on stage Devinie Curley she doesn't need that much introduction she has a PhD from Stanford I know her as a Stanford prof because when I was a student in computer science you started you started just before I left in 1995 we had her with us for a long time and then she did something really remarkable and amazing she started Coursera which I see as the first huge step to democratization of education and really feel so incredibly obliged to you for that I'm personally very impressed with that recently she started on something else she she founded a company called Incitro and she says always in interviews that she started Incitro to address the crisis of the increasing drug development costs so thanks for that work too and I'm so much looking forward to your keynote thank you Mark Devinie Curley thank you it's a real pleasure for me to be here and see so many wonderful and talented women here and on the other side of the camera entering a field that I entered almost 30 years ago now in the early 90s and it makes me feel really old but and there weren't a lot of women in this field back then and it's wonderful to see so many wonderful talented young women entering this space and I think it'll make a huge difference so I'll tell you today about what I'm doing in this field right now I spent my career at Stanford actually working at what I think of as the boundary between two disciplines one of those is machine learning and data science and the other one is biology and health and I use the word boundary rather than intersection deliberately because there really wasn't that much of an intersection between those two fields at the time because the data that was available on the biology and health side was relatively limited and so you could apply fairly simple statistical tools but you couldn't really apply very sophisticated methods it was a real challenge and I think that's changing and I think this is the right time to come back into the space and do amazing things and I hope that the talk today will be able to convince you of that so if you think about drug discovery and development it's kind of a real interesting glass half full glass half empty perspective on the one hand the glass half full is that in the last 50 years discovery has enabled drugs that have taken diseases that used to be a death sentence or a sentence to a lifelong of pain and agony into something that is now manageable or sometimes even a cure and cure is a very rare word in medicine today that includes vaccines for infections diseases cancer immunotherapies autoimmune system modulators one of my favorite stories is the cystic fibrosis drugs by vertex which took something that used to be a death sentence at the age of 20 and now gives 90% of those patients effectively an almost normal life it is a remarkable tale of achievement the glass half empty is this other side which has come to be known as eeroom's law now those of you here in the audience I'm sure have are very familiar with Moore's law eeroom is the inverse of more because it is an exponential decrease in productivity of pharmaceutical rnd consistently for the last 70 years this is a graph on a logarithmic scale where this is the number of drugs approved per us for per one billion us dollars the price tag of approval for a new drug is now 2.5 billion dollars now that is not because that single drug incurs 2.5 billion dollars on its journey from idea to approval is because that one drug carries on its back the many many thousands or tons of thousands of drugs that didn't quite make it so really what's going on in drug discovery and development this is a 15 year journey often maybe longer from idea to approval with many many forks in the road where one fork if we're lucky takes us to success 99 don't and we have only the very limited tools to make a decision on which of the paths in front of us are going to lead us to success and taking the wrong one can be a matter of years and tens if not hundreds of millions of dollars before we figure out that we took the wrong turn so how can we make better predictions of downstream outcomes hopefully helping bend that curve on the last slide so this is where i think machine learning and data science could potentially play a role i'm going to give a very brief introduction to some of the developments in machine learning maybe not as useful in this audience but i think interesting nonetheless i've been amazed at how much progress has been made even in the last five years so back in 2005 i used to do a lot of computer vision and you would give this image to a computer and ask is there a bear in this image a simple binary classification problem and the answer would be slightly better than random 2012 this is the era of image net which was created by my colleague here at stanford former colleague and friend feifei lee and this gave gave machine learning enough training data to the point that you could actually start answering a much harder question of what is in this image multi-class classification problem and and the label for this image would probably be wolf um this is actually a bear but it's a wolfy looking bear so you know not so bad um 2014 an actual label from the state of the art algorithm a brown bear is swimming in the water there's the bear isn't in the water but there is a brown bear and there is water so this is actually pretty good um in 2017 two bears sitting on top of rocks uh today the performance of machine learning for this problem is demonstrably beyond human level performance for tasks that each of us has been trained to perform since we were an infant so the fact that you could actually achieve beyond human level performance is actually to me remarkable now what made this possible um so if you look at the models that we used to employ way back when in the early days of machine learning these were usually simple models constructed on top of human constructed features so things like logistic regression or random force and people had to do a lot of what's called feature engineering to feed those algorithms and it turns out that those algorithms started out pretty good because they introduced a lot of human uh bias knowledge into this but they asymptote it at a relatively low level um today we basically don't do that anymore the computer starts out with scratch um and just the raw features and it takes it longer in some ways you require more data to reach a high level of performance but it turns out that the computers actually end up constructing better features than people do which is why the performance keeps going up and up so specifically if you look at uh what we do today this is what's come to be known as deep learning um and so the computer for instance in the case of images starts to get raw it starts with raw features basic pixels and it constructs an increasingly complex hierarchy of features built on top of other features so that in the end it's basic it's able to construct features that are very subtle and can make distinctions like between an arctic fox and an eskimo dog or a person we really hard put to define a feature that would actually make that distinction the other aspect of this which is important and comes back on the next slide is that there is also a uh new representation learning aspect here so if the original images say are a thousand by thousand then let's think grayscale for simplicity then they sit in a million dimensional space at the end of the day just before the final prediction is made that vector is usually about a hundred dimensions large so we have taken a million dimensions and compressed it into a hundred dimensions and those are meaningful features that the computer constructed so from a mathematical perspective what that actually means is that the machine has actually created a hundred dimensional manifold in a million dimensional space and has embedded all those million dimensional images in that one hundred dimensional representation and that space actually has interesting structure because um there is an infinite number a continuum of manifolds that one could construct but the machine has to actually put next to each other images that are labeled the same way so these two trucks even though in the original space with the arbitral it would be as far from each other as two random any other two random images here they have to be next to each other and because the hierarchy's hierarchy of features is learned jointly other classes that are not labeled the same way but share some features like windshields and tires are going to be next to each other in this manifold whereas other classes like cats and dogs are in a very different part of the space okay so that's the um good this is the glass half full side of machine learning in light of the earlier comments let's me actually talk a little bit about the glass half empty side of machine learning and why quality of data is of paramount importance so it turns out that these really powerful machine learning algorithms that are really good at teasing apart subtle signal are equally good at latching onto subtle artifacts so this is one of my favorite examples it's from an archive paper in 2018 um this is a paper that was uh that looked at how bias can creep into algorithms in the context of in a radiology problem where you're trying to diagnose fractures from an x-ray image this is the input and you can see they got really good ROC curves let's say you know it's not perfect but really compelling and then they looked at the embedding of that manifold that I talked about um and they visualized it and they noticed the following unfortunate fact which is that if you look at that embedding and you look at the distribution of fracture versus non-fracture actually kind of nicely distributed across the clusters so the machine hasn't learned to really distinguish fracture from non-fracture but it learned to do really well as distinguish the machine the x-ray machine that took the image really really well and so it turns out that in fact when you correct for that um you basically get close to random performance and what turns out to be the case is just certain hospitals have a larger predominance of fractures and a certain x-ray machine and that's what it flashed onto which highlights the importance of having really high quality data as well as a very rigorous testing regime so we'll come back to that later okay so why am I back in this space after um so many years um it's because I think we now have the opportunity to actually create and utilize massive data sets and biomedicine that didn't exist before so in some ways there have been two revolutions going on in parallel in two different fields that haven't really spoken to each other on the left hand is advances in cell biology and bioengineering each of which has been transformative on its own but together they've enabled a perfect storm of data creation so I'm going to talk a little bit about some of these advancements the first of those is what's called induced pluripotent stem cells it is an invention a discovery that earned a Yamanaka a noble prize a few years ago and what it is is the ability to take say skin cells or white blood cells from any one of us revert them to stem cell status which is a pluripotent type of cell that can create any cell lineage in our body and then differentiate them into neurons or hepatocytes or cardiomyocytes that carry our genetics and so it is able of us to create cellular models that capture the genetic diversity of a normal human population in the cell background that is relevant to disease we have using technologies like CRISPR the ability to further more perturb these cells to probe at causality what would happen if I changed this gene from this form to that form how does that cell behave differently we have the ability to gauge behaviors using a range of technologies of high content what's called phenotyping measuring those cells using imaging using super resolution microscopy using single cell RNA sequencing that measures the transcriptional profile and many more that are currently emerging and we can do it all at scale using automation and microfluidics this creates an incredible amount of data that looks like these this image on the left and the people are unable to make sense of that amount of data and so that's where the machine learning revolution that we're all familiar with can come together with the revolution on the left and really make sense of these data and help us understand biology and find new treatments for disease so in what we're doing is we're trying to put this together coming back to the e-rooms law slide before to come up with explicitly trained models that can perform predictions on which fork in the road we want to take so we're looking for places where such a prediction will make a big difference to the probability of success where machine learning is the right tool for the job and where we can produce the right data at the right quality and the right scale so that we don't have this example that I showed before with the radiology images so I can tell you about a number of efforts that we're making along these lines since I don't have as much time today I'm only going to take you about one but it's the one that I think is the most core which is how do we predict human clinical outcome if I make this intervention in a human what will it do now you're sitting here thinking well how can you create a data set where you measure human clinical outcome we don't get to do experiments and people until the very end of the process which is what's called a randomized clinical trial and that is absolutely right and so part of the idea here is how do we use other forms of data to answer this question so currently the state of the art if you will the normal standard operating procedure for answering the question what will an intervention do when administered to a human is well we can't administer it to a human so we're going to administer it to a mouse now the problem unfortunately is that mice are just not people and mice don't get most of the diseases that are the unmet need today mice don't get atherosclerosis they don't get Alzheimer's they don't get Parkinson's they don't get autism or schizophrenia or type 2 diabetes they don't get any of those things so so we create artificial diseases in these animals that are very different to humans and then we cure these artificial diseases and then we're surprised when the results don't translate into human clinical outcome so the question is how can we use humans as a model system for humans so as I said you don't get to perform experiments in humans but here is something that we can do each of us as an experiment of nature each of us as an experiment where mother nature has perturbed thousands of genes in our genome and we can measure outcome clinical outcome of of of different people to see how that connection between genotype and phenotype manifests and this is a place where you actually do have a Moore's law slide this is a graph again on the logarithmic scale and this is the number of human genome sequences the very first human genome in 2001 and you can see that not only is this growing exponentially is growing exponentially twice as fast as Moore's law so depending on whether you believe this trend line to continue the number of human genome sequenced by 2025 or 2027 will be somewhere around 100 million to 2 billion that's a lot of genomes um genomes on their own are great when combined with phenotypes they are better there's less human clinical outcome data available but there's more and more coming one of my my favorite resources in this regard is the UK Biobank the UK government was very forward thinking in creating a cohort of 500 000 just normal individuals with whatever diseases they may or may not end up developing and they collected thousands and thousands of phenotypic outcome data on them including multiple covariates like diet and environment and urinary and blood biomarkers and and for 100 000 of them full body and brain imaging and many many other phenotypes that are incredibly valuable and there's more of these cohorts emerging here in the United States we have the all of us cohort which is about supposed to be a million people also more genetically diverse than the UK cohort which is predominantly European so it's going to be a great resource and there's others emerging over time so if you take genotype and phenotype and put them together you basically have a suggestion of causality and because to a reasonable first cut approximation a genetic variant that associates with disease is roughly causal of that disease if you can actually figure out what that is um so that's great and sure enough from the very um in the last 10 or so years since these efforts first started there has been thousands of traits that have been associated each with hundreds or thousands of different genetic loci that are drive that that are that associate presumably causally with this disease some of some of these are diseases others are for normal traits like height or even um educational attainment or bone mineral density all of these have genetic loci associated with them now the thing is for each of those traits there is usually hundreds of different loci in the genome that have an association with that trait often with a very small effect size and how do we figure it out which is which now um so it turns out that indeed when you have um genetic evidence from human genetics for a disease um if you have a drug against that target it increases your probability of approval so if you look at the graph on on the right hand side here you can see that the odds ratio of approval for a drug that has human genetic support is about 2x which is really an incredible improvement in probability of approval so that's great um the problem is it's not that simple coming back to what I mentioned before um not all targets in this case are created equal um if you look at targets that are Mendelian in the sense that it's a one gene one disease association that odds ratio creeps closer to three the ones that come from the genome wide association studies where as I said there is hundreds of loci and you don't know which one matters that odds ratio is closer is actually less than 1.5 so it's actually difficult to figure out which of those makes a difference so what we're going to do here is we're going to come back to this other revolution that I mentioned earlier where we have high content biological data and we're going to see if we can get that to give us closer to the causal biology so let me give you an example of how that might work this is a case study it's just a really beautiful case study um of um of psychiatric disease this is a region on chromosome 16 called 16 p 11 2 that for whatever reason is subject to copy number alterations in the population relative to wild type it's deleted in some people and duplicated in others when it's deleted there's a 75 probability of autism and what is duplicated to 40 probability of schizophrenia 25 genes in the region no one knows which of them has that effect up until a paper about three years ago from UCSF that took IPS lines from the deletion patients the duplication patients and normal controls reverted them differentiated them into neurons and looked at them under a high content microscope and what you see is this and even to the untrained eye you can see that the neurons on the left look a lot bushier the neurons in the middle and and the ones in the middle are a lot bushier than the really naked neurons on the right there's a significant deficit there's a significant increase of synaptic arborization on the deletion and a significant deficit on the duplication so we don't know still what's causing which of the genes in the region causes this but gives this gives you a phenotype that you can now try in this IPS culture various interventions to see which of those reverts the phenotype from the unhealthy to the healthy state so that gives rise basically to what we're doing at in Cetro can we do this at scale and using machine learning rather than manual looking at things under the microscope to get at what distinguishes unhealthy versus healthy cells of people with a lower or higher disease burden of genetic genetic disease burden so if you imagine these are cells that are embedded in a manifold like we talked about in the images side you can imagine that just like in the image that I showed you before you have clusters that look like clusters that are healthy that look the same and you have different types of unhealthy clusters each of which looks different and those clusters will emerge by by looking at the cluster diagram in the manifold and we can now ask ourselves which intervention might revert an unhealthy to a healthy cluster now what's important about this is that this manifold gives you two things it gives you first and foremost a stratification of the segmentation of the patients into subtypes that might not be visible at the clinical level the big transformation that happened in precision oncology was when we realized that breast cancer was not one disease there we were able to do that because we had a lot of molecular data from enough patients that were obtained from tumor biopsy samples of those patients and we now know that her two positive cancer is very different from a BRCA1 cancer is very different from a triple neg cancer and each of those is treated differently today which is what's given rise to the tremendous advancements in treating cancer not just for breast cancer but for other cancers as well here because we will have enough molecular data from enough different genetic backgrounds these different clusters will emerge and because we have a cell based system that is very scalable and intervenable we can ask what intervention what drug might allow us to move from say the yellow cluster to the blue cluster so let me give you just a couple of examples to show you that this actually works this is work that came from our own lab we've created massive data sets that look at different cells under different conditions with different genetics and so on and this is just a few examples to show this this is a bunch of cells each of which were treated with a CRISPR intervention with targeting a different gene and the question is can you just by looking at the cells figure out the different genetics of those of those gene of those cells and the answer is you can't this is what the manifold looks like and you can clearly see clusters that emerge and this is a comparison to the state of the art method that used an engineered set of features this is a fairly subtle perturbation because it only touches one gene and it only reduces activity by 20 to 40 percent so the fact that you can get this level of performance is quite striking I'm going to show you the one one last one which is very recent work this is in a disease that we're working on called Nash non-alcoholic steatohepatitis it is a fatty liver disease whose prevalence in the population is becoming increasingly large because of the obesity epidemic and the type 2 diabetes epidemic and it's going to become the largest cause of of liver cancer and liver transplants in the next decade so the question is can you can you create a cellular phenotype for Nash that you can then screen for interventions that revert that revert the phenotype so this is a bunch of cells from Nash patients and controls and you ask yourself which are the ones that are Nash and which are the ones that are different I'm sure you can't tell the difference but the important thing is neither can a trained cell biologist who studies Nash and even when we highlight the ones that are Nash versus non-Nash you kind of look at this and you say I don't know and that is exactly what they said so we've plugged that into a machine learning algorithm to distinguish Nash from non-Nash and you can see a nice separation in the training set what's more important is you can see a similar separation in the test set and rigor of machine learning this was not only done on a different set of patients it was done on a different set of patients whose cells were obtained from a different vendor just to make sure that we're not overfitting and so this is what you get and again you can start to look at the embedding and see what insight that gives you in the biology you can't tell me that much from pick from something of this size but when you look at the highest ranked tiles that for the Nash versus the highest ranked tiles for the non-Nash you can clearly see that it's latching onto a phenotype that corresponds to these green circles or lipid droplets the big pink circle is the nucleus lipid droplets that are at the nuclear membrane and you can use attention models from the machine learning to look at what exactly was the network looking at by back propagating through the network and you can see that in fact in the Nash case it was looking exactly for those lipid droplets at the nuclear membrane versus the non-Nash was looking at a diffuse signal so wrapping up there is what we're really building is an incredible data factory that uses all of those tools that I mentioned before IPS cells CRISPR high content phenotyping and with automation at scale to create massive amounts of biological data on the other side we're using techniques from statistical genetics and from data science from and for machine learning to interpret those data but these are not two separate things these are two feet these are two loops that feed into each other all of the work that's done is done in interdisciplinary teams that work together like this to figure out what is the problem that we can solve together what is the experiment that we need to perform what assay do we need to build that really captures the biology and how what model do we build and what insight can we extract from the results this type of interdisciplinary collaboration I think is absolutely critical in this space and I think in other areas as well the separation of data scientists into one side of the of the organization where where data are thrown over the wall and results are thrown back over the wall is the wrong way to do data science so just highlighting that for those of you I think this is absolutely critical to success so I'm going to take a big step back now for the last minute or so of this talk just mention why I think this is a really wonderful space to be in this intersection of machine learning slash data science with biology and health if you look back at the history of science there are at different periods in history one discipline that takes on an incredible rate of progress because of a new insight or a new invention or a new way of of measuring things in the 1870s that discipline was chemistry where we understood the where we understood the periodic table and moved away from alchemy and trying to turn lead into gold in the 1900s that discipline was physics understanding the connection between matter and energy and between space and time in 1950s that discipline was computing where we were able to use silicone chips to actually do calculations that up until that point only a human and sometimes not even a human had been able to do and then in 1990s there was an interesting bifurcation because two disciplines suddenly took on that incredible progress one was the era of data which is related to but different to computing it includes elements of computing but also of optimization statistics and neuroscience the other era was that of biology quantitative biology where we moved away from a purely descriptive science of cataloging phenomena to really understanding the principles of what drives biological systems and this was enabled by the tools that measure biology in a very quantitative way micro rays the human genome super resolution microscopy and so on all allowed us to really measure biology in unprecedented ways but those two disciplines didn't talk much to each other I think the next era that's coming upon us now is what I call digital biology it's the ability to measure biology in unprecedented scale detail and fidelity the ability to interpret what we measure using the tools of machine learning and data science and the ability to take that insight and rewrite biology to do things that it wasn't meant to do and that will have an impact in biomaterials in agriculture in energy in the environment and in human health and I think that's an area which is a really exciting place to be because I think that is the next big epoch of science thank you very much thank you so much Stephanie for an amazing talk thank you very much thank you and are you going to be around for a little bit just a little bit yeah so in the break yeah so if you have any questions for Daphne the break is coming up in about 40 minutes so please find her there thanks very much again Daphne Kohler yeah here's Liza's drawing so I'd like to welcome on stage and now the panelists for the ethics panel Lucy Lynn and Ashley Han right and while we're setting up I want to do a shout out to another with conference that's going on Cal State LA so shout out to them I know they're still watching please come and find a seat somewhere on this stage we have a very comfy chair so hopefully you had some coffee so yeah anywhere here let's let's take those so you don't fall asleep please find yourself find yourself a seat it's a little bit strange to introduce myself saying I have the pleasure of moderating this this panel this year we're playing with two different panels in if you've been here before with which you know that we always have a panel where we discuss careers that will still take place this afternoon but with this incredible search and discussions around fairness and accountability and transparency and the incredible importance of these topics in data science we thought it would be really great to set up really start this day and the technical talks with discussion around that and hence this panel here as I said you know as engineers and scientists and certainly data scientists and leaders and we're all here in the audience in in one of these capacities we always have the responsibility to deliver high quality of course and reliable and trustworthy work we have the responsibility to really understand what we're doing and to understand the consequences of our labor and to really think very carefully about fairness accountability and transparency by the way the acronym for that for a long time was FAT and I was always so surprised at this we have these conferences go come to FAT 2020 I always think about it as faith spelled the old-fashioned way F-A-I-T-H-E we have fairness accountability integrity transparency honesty and equity that's a that's a whole mouthful but I think it captures it captures everything so we need to pay attention to this but as in everything it doesn't always happen there are often many pressures it's competition there's power there's money there's greed and sloppiness also gets in the way so we have to start paying attention there's a recent urgency I think in this field because of the growing hype also around artificial intelligence and the search in data science in every particular field and for me personally also this growth in black box solutions that are out there that makes people just grab a black box and use it without really understanding maybe what's in that black box and then the other thing is also as data is growing it definitely pointed out it provides tremendous opportunity but data in itself of course always is biased humans are biased and if you combine that with black boxes then the outcome can be a little dodgy to say the least so people are going nuts about this area right now which is in a way great for us but we really need to think about this so we must pay attention and this is not virtue signaling we cannot do that you know sometimes we we we feel that way when companies talk about ethics and we think it's an afterthought but we really need to need to start for this hence today after this fantastic keynote by deafening starting with this so we have three women here with us who think about these issues a lot and I asked them to describe themselves in just a few words as an introduction a little different than what's in your book so the first one I want to introduce Ashley on De Marcaia here and she's originally from Turkey and she says I'm a mathematician that came first second was a mom third was a teacher fourth was a researcher and fifth was a traveler and and I don't know if your husband is watching but wife is not in this list because we are all women but thanks so much for coming and I know her her boss she works for a company called Vayan and her boss is Fischal Sika and we are very old friends and so a shout out to Fischal and he's watching and that makes her a little nervous but don't worry don't worry about the second person I want to introduce to you to you is Lynn Kirabo here and she describes herself as a quirky and passionate learner who constantly engages the potential of technology both in East Africa and the United States so thanks so much you flew in from the East or somewhere around the East and and so it's wonderful that you're here today as well and then we have Lucy Bernolds here with us she's a colleague here at Stanford University and her description is also really interesting she says I'm a philanthropy wonk we have to talk about this what you mean with this says my core professional question is what's public what's private and who decides she says big data has become a source of power inequity in business and in government what do you do about that and I love this description Lucy because it's the perfect opener for this this panel so what do you think what do you do about that well I think the first thing first of all thank you for having me secondly for a conference for women in data science I mean I think there's an inherent assumption in calling out the need for gatherings like this which is that women have experienced what it's like to be on the receiving end of a power imbalance I assume that's part of the the thinking in this in this gathering so that understanding that by gathering these enormous datasets of information particularly I'm here I'm talking about information on people you've got a bunch of information but the people you have information on don't know you have it there's an automatic asymmetry but what we're seeing in the world around us right now is what happens when a few corporations create you know a sort of corpus of a resource that serves as a moat to any competition and certainly we're seeing what governments can do with this enormous resource so I think the very first thing to do for people who are actually working with this resource all the time is to recognize that it's inherently about power now you're very active in this field and you've worked in this for a while what what got you starting in this what's public what's private and who decides because you know you can think about data in a lot of ways you hear it all the time talked about as an asset or resource or whatever you want to call it but it's actually our identity when it's that about people what you're talking about is my identity in a data set that I have no control over that's a question of who what's public what's private and who decides when I came to stanford in the 1990s to think about that question and I'm trained in this historian I had a great experience working with historians and political scientists when I came back to stanford in 2011 suddenly all the engineers were interested and all the computer scientists and all the engineers what you want to talk about privacy and public and decision making let's let's have coffee which is a good thing I'm glad they're interested we had coffee we had coffee and here I am yeah yeah here she is this is what happens when you start talking to people your coffee comes in Lynn you are I have to call her out she's very courageous she's finishing up her PhD this is a scary thing to be at the conference and so so this is wonderful so you started your PhD working in this area of what I call faith fairness equity and so on what drove you to choose this and and also tell me what's your biggest passion right now I mean it must be your PhD work we hope so so I am actually doing a PhD in human computer interaction and so that's basically at the intersection of three different fields right so like behavioral sciences design and computer science and you know when we're talking about faith you know like faith the acronym humans have to be involved right because if you're building solutions for people you need to include the people that you're building for because if you don't do that then we end up having the problems that we've been having and so my one of my greatest passions is working on solutions for where I come from I come from East Africa I'm from Kampala Uganda and I hope they're awake watching so yeah so right now we see a lot of solutions a lot of systems that are trained on people who don't look like us people who don't have our experiences people who do not have an understanding of our context and that's a problem because the world is getting smaller and solutions are crossing the globe fast faster than we can think and so when you have you know solutions that are not designed for you being forced upon you or you know you just using them there there's like an inherent problem there at least in my opinion and so in answer to the question that that you asked what do we do about it we need to first acknowledge that there is a problem you know yes we we we have algorithms that are working now we have you know solutions that that we've built but there is a section of the population that we did not consider when we're building and so I think the first step is saying yes we have a problem and then working on a solution together yeah I'd like to hear more about your experience also in in Africa because of course one of the interesting things for me about data science is is that it doesn't really know that many borders right everybody in the world can participate in this you don't need mineral resources for example as in is in other areas right you need a computer and you need to have brain power and brain powers everywhere although at the same time we still have monopolies like because of data so we'll come back to that I'd like to ask you more about this yeah build on that the people who aren't included are the majority of the people in the world right right represent the majority of the people yeah so so actually honey you come in from a really interesting point of view as well because you work for this company Viya and then and Viya was really started with accountability in mind you know with robustness and reliability and you know we talked a little bit earlier about black boxes and and I think with Viya and the whole idea is actually to open up the black box and to understand a bit better what's going on what what drove you to join that company so my background is mathematics and as I know you introduced me but I worked for 12 years I'm still affiliated with the University of Hartford I'm an associate professor of mathematics so I see things from math approach so during my data scientists science certificate last year at Berkeley you know we learned about all these black box models but I wanted to dig in and understand what is going on in terms of math models are working great like we were assigned projects I do projects 95% accuracy it's working great but how is it working like okay I'm training my data set but if I pick something from not test data from something not familiar with the training data set do I get really like good result and I tried I made experiments also at YNI we did experiments and we saw this robustness is so important because what is robustness so if the system is not stable like if you change your data a little bit and if your system blows up like gives something nonsense then it is not robust and this is very important it is so I want to be a part of like a group who can fix this by using like models that are explainable to everyone not just the experts in like black box models like NLP not like only 20,000 who are experts in this high tech like any data scientist so I wanted to be one of them like who is like who takes who wants to take a part that will deliver to all data scientists from coming different backgrounds from statistics from physics from biology because we are dealing with data so instead of like expect accepting what the blacks models do like here 95% accuracy or model is working no we won't accept it we want to play with these initials different data set are we gonna get really the correct results because it may ruin people's lives there like models NLP models which I wanted to give an example maybe I know if it is time but yeah sure I want to share because like again from math perspective most of you know google translate right you write and then you because I'm a native turkey speaker I use google translate from time to time and there has been this standard example which shows the google translates gender bias towards male so for example in Turkish we don't have he and she when you write he is a doctor sorry when you say in Turkish O bir doktor O is like it he and she it translates to he is a doctor when you say O bir hemşire it translates to she is a nurse I know if you heard about this example so there have been lots of discussions critics so google translator like developers they fix this problem recently when you write it you get two outputs he's a nurse she's a nurse he's a doctor she's a doctor but this is just you know I know how they fix the problem maybe they fix the model or the data set but you know we have a solution now it doesn't make us sad to see he you know he's a doctor but there is also Burt which is published in 2018 I know if you had any chance to play with Burt but I did I wanted to see if it is also gender bias and I'm a mathematician I studied engineering in my undergrad so I picked some sentences I saw how Burt works you mask a word and then you are going to see the most likely words that will fit in that sentence so I made an experiment and I wrote my predictions on a piece of paper and then I looked at Burt's prediction results and when I compared I was amazed with these results I want to make the experiment here to you if you give me 10 seconds am I talking too much yeah yeah okay I want you to close your eyes but be honest with me I'm gonna say I'm gonna tell you a sentence and then you are gonna picture someone okay close your eyes okay picture a mathematician open your eyes okay gender what's the gender of mathematician who female okay female what about males be honest yes so kind of 80 percent males this is what I did I picture the male and when I looked at Burt's results I have the results here it's 83 percent for mathematician for male and 17 percent for she and then actually what is worse I added great and then it went up to 93 percent for he 7 percent for she which you well this is this is exactly what we want to change at which right great mathematician should be a woman what is the solution how can we change it are we gonna ask the builders of Burt's to change it or are we gonna do how I want to ask Lucy about this also because when we were preparing for this ethics panel you talked about this and you talked about this idea that look there's bias everywhere and one of the big challenges for us is to recognize it and then to know what to do about it as engineers we're usually not trained in that area so my question to you is Lucy you talked to a lot of engineers and a lot of people working in this field how do you even recognize it and how do you train people or yourself to become better at recognizing bias so that you can actually address it so you know in the engineering and data science and mathematician and stem fields in general over the last decade and a half maybe two decades or four there's been a big movement to do like data science for good right first of all personal statement if anybody says to me blank for good I'm out but on these issues and precisely to the example that was just given if all of the women in this room if 80 percent of them see a man when you say the word mathematician that's a social bias we've been habituated and culture aided literature movie taught parented whatever you want that's where that comes from that's not coming from a data set that's coming from the world around us so rather than putting data science out into the blank for good what we need actually are those social values very explicitly part of the education and training of folks working in this field but it's got to go much beyond that I mean those biases get baked in really early so I'm not going to make you all responsible for changing society but what you need to do in order to improve the data science training whether that's as a mathematician in an engineering school wherever is really engaged deeply meaningfully into uh Daphne talked about having interdisciplinary teams we need this these processes to be designed to help you learn to look for those biases assume they're there they're there assume it start from that assumption and go looking for them and I love the idea of actually trying to make the black box explicable what about actually trying to put all the brain power in this room to understanding what data science shouldn't do where it's going to absolutely accelerate and exacerbate the existing systems and if you know that's what's going to happen then don't put it out there then don't don't just hope for the best because what we're seeing is what happens when you just assume the best and not the worst right or when when these sort of unintended consequences that you can have become the afterthought instead of the the thought you're more generous than I am I'm not sure that some of what we're going on is unintended no I'm sorry but I also think there's this there's an optimism for it to engineering and science there's a there's a a wonderful enthusiasm for looking for solutions but you have to do that in the context of of recognizing that the data will be biased your algorithms particularly if they're designed to accelerate learning guess what they're gonna do accelerate bias I mean there's nothing unintended in that actually it's it's working as designed yeah no you're absolutely right I'm just trying to be a little bit more political than yeah well this is why I have your other panel because you're the host yeah that's right I gotta stay that Lynn I want to go to you because you mentioned that bias already in your introduction today and we've talked a little bit about that too um and you talked about this wanting to be people to be proactive rather than reactive so talk more about that and what do you do yourself in your work yeah so I I was just you know I really agree with what you say um because one of the things that I was thinking about is what if we turn the design process on its head what if we started actually asking these questions that um yes I built the system but what if it fails like what if it misses you know like you let's say you have a facial recognition system you're trying to do a affect recognition right but does that work on people with disabilities because I work with people with disabilities as well does it is it trained on what their facial movements are and if not why like why didn't you think about that and so thinking about instead of just you know let's put it out in the wild and then see what happens turning that on its head and thinking about it from the get go like from the beginning of the design process and yes it's going to take time but is it worth the effort to um save money at the beginning and then have this epic fallout when you've released whatever it is that you're releasing because I guess you know people will say um you know we don't have the cost for that we we don't have you know the manpower to think about that right now and we can't cater I've heard this one a lot you know we can't cater to everyone in the world but it's technology like yeah it's always surprising to me when they say they can't cater to everybody and what they mean is women which are just about 50% of the world population even though they're going to try to sell whatever it is to everybody in the world well in this case they're actually talking about because a lot of the times I ask people so um have you thought about how the solution will be used in the developing in developing countries like in the global south and they're like no um yeah but why not well you know we can't really cater to everyone's needs and you know we can't really build you know context context specific solutions but I'm like the majority of your population might be in the global there's some applications that are more used in the global south so yeah one of the really interesting things for me and and also really difficult I don't know what to do about this is we are developing uh some of us anyway here here also in Silicon Valley ideas about fairness and equity yeah that are of course based on some metric that we design but the metric is designed from our own perspective yeah the moral framework that we have here in the west right here in california and the concept of fairness extremely difficult to develop a metric for period but also very difficult to develop metrics that are applicable to other areas in the world where a different morality or a different moral framework a different concept of ethics may actually actually be so it there's something really wrong about us here in Silicon Valley for example designing tools that are supposed to work everywhere in the world because it's almost impossible yeah but then that's that brings back the the topic that was there was one of the things that has been said already is you have to have these interdisciplinary teams but you should also think about diversity in terms of in terms of geolocation right so do you have people on your team who are from far places um do you have consultants that you can reach out to sometimes it's just put the survey on twitter you know so so I understand yeah and then when we talk to companies as well and and especially startups and we we asked them these questions I understand the pressures of the marketplace yeah and the people want to bring out products really fast we're also now in in the time where folks get really interested in applying data science to areas where it hasn't been used before yeah they want to be the first and there is money to be made in that so that brings me to accountability use use one of you said uh wouldn't it isn't it bad if you don't have bear the cost upfront really thinking about it you're going to pay for it later sometimes they don't uh but if they do you know how does that come up you know how do we keep people accountable so my question to you is how do we do that is there a is there room for government regulation is this something that can be self-regulated by the market no is it I knew you were going to say this you get a one word answer very infrequently but you know but you know in Europe they've they've they've put some extra rules on people here are getting a little bit nervous we have seen some bad publicity around decisions made by big companies and people get a little bit nervous yet these companies are still doing well how do we hold people accountable for this I say you want to start jump in with that yeah um the power is in this room yeah because the truth is the data scientists know the algorithms they know the limit of what they're building and I like how in the recent past we've seen a lot of sit-ins we've seen a lot of walkouts we've seen a lot of employees take a stand against what they believe is unfair use of the product that they're building and so I think um sometimes it may not be that um we can approach the government like I don't know how it happens here but you know back home it's not it's not popular where you can just you know try to inform policy but I think as people who are building you have the voice to say um this this will not work like this because of x y and z and we as data scientists we as women in this field take a stand and say we will not build x because of this so yeah I believe that the power is in this room there's a really interesting point that you're making and sometimes we're asked this question suppose that you're working as a data scientist in in a firm yeah and you see your work being used in ways that you don't like yeah you know what do you do about that well the recent past has shown that some companies are actually listening to their data scientists like you don't want to be that person who could have said something and didn't because it in turn affects you your own mental health and stuff like that but so I would just say speak up there is a consequence what if you get fired but if you get fired maybe you don't want to be working for that company anyway I also I want to build on this um because it's it's also it's not any individuals problem to solve it's a it's a classic collective action problem yeah for the actual trained data scientists in the room there's going to be all the levels of the company or the government in which they're doing this work um for civil society around um the corporate structures we're seeing an explosion and organizations devoted to fairness accountability accountability and AI and machine learning in civil society these are organizations of advocates around it it is a collective action problem so maybe you know one thing you can really hope to get out of today are some of those connections that will give you the support throughout your professional career that will also then enable you if you do find yourself in that situation of having to actually stand up to stop a product from being shipped or if a if an algorithm and a set of analyses can't be explained maybe there's a an advocacy strategy that the people in this room and the people watching could be the leaders on on understanding what to do when we shouldn't go forward with certain things and that leadership can come from this room but you have to think about it as something that's going to take the collective its effects are going to be collective and it's going to take the work of people working together both to make a change within a company whether to make a change with or without a government your east African governments and the United States government not that different right now in terms of reliability and doing the right thing so that's that's the power of the people that's the classic work of civil society and that's why you know people want to come together and find others who share their interests you've got a whole room full of them here and on the I was going to say on the phone on the camera how old you are yeah it's been a friend so actually on you work for a company where this is actually the prime target is to really understand what is inside these black boxes to build software tools that are explainable in that sense so you must have worked with quite a few companies you see companies working with with the software that you're producing what has been your most exciting connection there that you've had with the company okay so I'm coming from academia so this is my first company and yeah as I told you this company we are so being a platform members we are digging into this black box models and trying to make it explainable to data scientists who are working for enterprises especially who are switching from different backgrounds and want to learn data science but who didn't maybe do phd in machine learning so so what is the biggest barrier that you see for people to really understand so during my so at the company or like at the company yeah yeah the company so what is the difficulty yeah what do you see when people come in and you try to give them an understanding of what's happening in in the software so our platform is not ready yet well we'll be ready well it's I don't know I mean it's kind of I know how much I can tell but the intersection of all these different backgrounds is math you know like so the idea is using math based language we can present a platform where data scientists who are coming from these different backgrounds can easily play with the initial conditions like the training like they can perturb the data set and see how the results change so can check the robustness and then we the model the platform will have like uncertainty which will let data scientists to see if how reliable their result is so it will be easy instead of working on like 2000 lines of code in tensor flow of pytorch are you know all these languages the platform will be easy to use just using math and it will not be like very advanced math it's just if you know how to differentiate I mean you won't even need to differentiate if you don't want to but everything will be visual so uh but math is that you know you you're really advocating look if you want to understand you need to have a basic understanding of mathematics so so that is one thing so what would you say lucy is an is an essential thing to learn and you really want to understand this feel better um I get past the ethics word ethics has become a greenwashing word and if I asked everybody in this room what it means we get everybody in this room times two definitions right because it's about values and it's about it and you've got to articulate those values so if you haven't experienced in your academic training on your professional training you steered away from all those humanities kinds of things because it's all about soft smushy value stuff that's actually what you are encoding into your algorithm and I'd back right up and go get some training in that and I'd really actually seek opportunities within my workplace whether that's in a company for the government working in civil society organizations a very clear explication of the values that the group thinks it's working about working on because there are two values that we as a society as societies are being subjected to now on a daily basis that if you actually rank them for yourself you probably find they're not your one or two values they may be your own personal 10 or 11 that number 10 or 11 and that's um efficiency and scale and last time I went out in nature I looked at arts or I met with my family I talked to my friends I thought about you know the society I want to live in efficiency and scale really low on the list really low we're talking justice love friendship beauty truth so if you just talk the conversation stays at the ethics language you're not talking about anything you're actually absolutely talking about an empty phrase but you've got to get to the values flame what's your call to action your advice I yeah so I would say get out of your silos because I think just you've just expressed it very well I think when we when we silo ourselves or when we just are surrounded by people who are like us like we're all data scientists and we we do things like this and we use these algorithms and we you know have these specific outputs we end up with systems that are inherently built just for that silo and that is not the case anymore when it comes to technology right your work will be used by someone so far away from you and you'll be amazed at how they use it and so I would say you know I'm so I'm PhD students if there are any students in the room go out and look at people in other departments talk to them at at CMU I've seen students come together form reading groups and actually you know have discussions about what does ethics mean for you what should ethics look like what should fairness and accountability look like I'm going to ask you all to to tell us what your what your most hopeful thought is in this space okay because we've we spent quite a bit of time now talking about all our worries and so on and before we do that I just wanted to tell you we have a little bit of room for questions so we have here is a mic and so please if you have a question for the panel signal to the mic carriers and they will get ready and in the meantime we'll just quickly go and and hear your your hopes because I want to I want to end the this part of the panel with something positive so who wants to go first you're ready so my hope is we are going to understand the models all these black box models so when we get good results like 95 percent accuracy we won't say okay it's a great model we won't get satisfied with this result we are going to see if the model hurts any group we have to be sure that it's fair to any protected group so my hope is it will be less biased more fair most of these most trustworthy and most robust so my hope is we're we're going to build some amazing systems because right now AI machine learning data science they will have this amazing incredible potential for impact but imagine the potential if we added more diversity if we added more got more voices involved and we could build you know things that are amazing so that's that's my naive hope yeah nothing naive about that future there's a lot of talk about humans in a loop in a lot of algorithmic processes and and data-driven solutions I'd encourage you all to think really about society in the loop and that that's a recurring recursive relationship it starts at the beginning goes all the way through the process and yes there has to be some way when something is released that there are appeals processes due diligence and there's recourse for action if things go wrong but I think that the people in this room and the people on the live stream and participating in this community around the world are precisely the people to help build that right thank you so question from the audience there was a mic going around yes okay so thank you very much for sharing your thoughts about buyers and ethics and everything so as a data scientist I'm very agree with a lot of things you said and as a man I agree and also I disagree about different things my first question is you get only one yeah okay my only question is is a problem is I think the problem is not the data science or the model the problem is the data sets and you mentioned that you want the data scientist to be more aware of what they are doing but I think the main problem is that the data science and the models and the data scientist are slaves of data sets so what do you think about the need of creating a united nations of data sets to to to be sure that the data science are not carrying the the bias inside because the model is just a consequence of the data set you want to take it Lucy there's no one data set for every possible algorithm is that we just what was the suggestion that there be a donation of data to an unbiased data set but I don't think people I don't think there's a single answer to every algorithmic data sets or that represent humans are going to be biased in some way or another and you're going to if you're addressing them for different purposes then you've got the model I don't think it's I don't think there's a one off solution you know it's a comment of course that is often made is that as a data scientist whatever work we do can only be as good as the data I think was really really important that we look at that data very carefully before we start using it yeah I think also if we're not saying that the problem is the data scientists right we're saying that you know there's an inherent challenge with some of the applications that are building being built right now if I can speak being built and we need to as data scientists well I'm not I'm a cousin of the data scientists but we need to as a field think proactively about the ways in which our applications can be misused let's say facial recognition works great here but do you want that being applied in a nation where the government is you know doing some really sketchy stuff somewhere else right you may not have you may not have power over that but you can think about how your application might might impact in that situation well well thank you so much for sharing thoughts with us today and I'm really looking forward to the rest of the day because I know many of the speakers are going to be addressing these same sort of issues so thanks again Lucy and Lynn and Ashley Hanne for joining me on stage here for this right and believe it or not but this was this is already the end of the first part of the day so we're 25 through I know several people on the other side are signing off now and they're doing their own thing later on with local speakers so thank you for joining us on the live stream to all of you who joined up for for these hours we have a break coming up during the break we'll have a live stream of interviews by the cube please connect during the break there is a job board out if you have any openings in your company or your organization please post these job openings on the board if you're looking for internships or jobs please post your name there too and hopefully you will be able to connect today don't forget to tweet don't forget to instagram don't forget to join us on linkedin as well and we'll see you here exactly at she says because I'm so on top of everything 11 okay see you then live from stanford university it's the cube covering stanford women in data science 2020 brought to you by silicon angle media hi and welcome to the cube I'm your host Sonia Tagare and we're live at stanford university covering wids women in data science conference 2020 and this is the fifth annual one joining us today is john hoger who is the principal data scientist manager at microsoft john welcome to the cube thank you so tell us a little bit about your role at microsoft I manage a central data science team for microsoft 365 and and tell us more about what you do on like a daily basis uh yeah so we look at across all the different microsoft 365 products office windows security products it's really trying to drive growth uh whether it's trying to provide recommendations to customers to end users uh to drive more engagement with the products that they use every day okay and um you're also on the wids um conference planning committee so tell us about how you joined and how that experience has been like yeah actually I was at stanford about a week after the very first conference and I got talking to Karen one of the co-organizers of that conference and I found out there's only one sponsor the very first year which was walmart labs and the more that she talked about the more that I wanted to be involved and I thought that microsoft really should be a sponsor of this initiative and so I got details I went back and uh microsoft's been a sponsor ever since and I've been on the committee you know trying to you know help with identifying speakers and you know reviewing the different speakers that we have each year and it's it's amazing just to see how this event has grown over the the four years yeah that's awesome so when you first started how many people attended in the beginning uh so it started up as being you know this conference with the 400 or so people and just a few uh other regional events and so it was live streamed but just really to a few universes and ever since then it's gone with the wids ambassadors and people around the world yes so now wids has is over 60 countries on every continent except Antarctica as told in the keynote as well as has 400 plus attendees here and is live streamed so how do you think wids has evolved over the years uh it's it's termed from just a conference to a movement you know it's uh there's all these new regional events that have been set up you know every year and just people coming together and working together so we at microsoft we're hosting uh different events we've had events in redmond that uh head office and also in new york and boston and other places as well so as a as a data scientist manager for many years at microsoft i'm sure you've seen an increase in women taking technical roles tell us a little bit about that uh yeah and for any sort of company you have to try and uh provide that environment and part of that is even from recruiting and showing that you've got a diverse uh interview loop and so we make sure that we have women on every set of interviews to be able to really answer the question you know what's it like to be a woman on this team you know if it's all men you can't answer that question uh and so you know that helps as far as really trying uh you know encourage more women to come in to some of these these stem roles and uh i've now got i've got a team of 30 day scientists and half of them are women which is which is great that's awesome um so uh um what advice would you give to young professional women who are just coming out of college or who just starting college or interested in the stem field but maybe think oh i don't know if uh there'll be anyone like me in the room uh you ask the questions when you interview them like go for those interviews and ask like like say what's it like to be a woman on the team and uh you're really ensuring that the teams that you join in the companies you you join in uh inclusive um and really value diversity in the workforce and talking about that um as we heard in the opening address that um diversity brings more perspectives it also helps take away bias from data science how have you noticed um that bias uh becoming uh more fair especially at your time at Microsoft yeah and and that's what diversity is about is just having those that diverse set of uh perspectives and opinions and having uh more people just looking looking at the data and thinking through you know how the data can be used and uh ensuring it's been used in the right way right um and so um what are you going forward um do you plan to still be on the woods committee what do you see woods going how do you see woods in five years uh yeah i'm i've lived in part of this conference and been on the committee and i just expected to continue to grow i think it's just going beyond uh a conference to also be in the podcast and all the other initiatives that are coming from that great um john thank you so much for being on the cube it was great having you here thank you thanks for watching the cube i'm your host sonia tigare and stay tuned for more i know there's still people coming in but please uh see yourself as fast as possible because we're ready for the technical vision talks so i know 20 minutes goes by really fast i didn't have time to get coffee so hopefully i won't fall asleep during these during these vision talks maybe maybe i need to ask somebody to get me some espresso uh but i'm just absolutely delighted that uh we have our next speaker with us today uh we had a wonderful time yesterday uh yishu and i because we did a podcast together and i'm very excited to tell you that that will be up and running in in a couple of weeks so keep an eye on that yisha uh unbelievable career woman and she is now at linkedin uh having a very large uh data science group and yeah i'm just so excited that you're here today yashu everybody from linkedin thank you uh margo for such a warm welcome introduction uh i'm super excited to be here today what an honor for me to be back here after i graduated from stanford 10 years ago and what's more important is to be among the weeds community thank you all for having me so uh i am super excited to share with you some of the early effort that we have at linkedin uh with regarding creating global economic opportunity with responsible data science first and foremost i want to say that what i'm going to be talking about the work i'm sharing today is impossible without many bright individuals from linkedin and i cannot give them enough credit and of course if you don't like the material it's all them as well at linkedin uh we have this vision of creating economic opportunity for every member of the global workforce and the data on the linkedin platform constitutes what we call the economic graph and the economic graph is really a digital representation of the global economy and with data generated from interactions and uh engagements from millions of members millions of companies thousands of skills and and and schools and so on so forth and with all this data and the insights we can ask many important questions important to understanding the economy understanding the future of the workforce so what we do is we actually collaborate with many global institutes across the world such as g20 world bank world economic forum to tackle those questions and really helping them design economic interventions that is really able to help us to be better prepared for the future as an example our data is able to help us understand what kind of skills are trending over time right what is the kind of imbalance that we see in between the supply and demand of various different skills across different geographic locations i'm sure everyone here is excited to see that data science as a skill is really the gap is really widening over time there's much much more demand over time as we see with regarding this kind of skill and it's really across many different geographic locations as well and we can also split the data by gender it's probably not surprising to people in this room but it's certainly concerning that we only have about 20 percent of AI professionals who are females right and this is really across different geographic countries and everywhere and what's even more concerning is if i take a cut of this data by industries so AI professionals the gap the gender gap among AI professionals is way wider and it's even the case among industries such as education healthcare which really were industries historically have been very popular among female professionals and we saw earlier right the data science AI skills has really been training there's more and more demand for skills like that and this is really putting us at risk of widening the gender and equity gap and it's really going to take everyone here in this room and way beyond for us to figure out how we can make sure that the trend does not continue so now let's come back to this little blue button on your phone hopefully every single person in this room now feel as much as i do both the opportunity and also the responsibilities that this little button has as we try to create economic opportunity for every person in the global workforce so what role does data science play in this so let's first take a look at what's underneath that little blue button there's certainly a lot that goes on behind the blue button and not the least the massive data infrastructure that we are able to use and build to process over eight terabytes of events that happens every single day and obviously to process all this and compute all these data sets we have to have nearly half a million of offline jobs that runs every single day and all that data and its potential really comes with a set of responsibilities and by responsibilities i don't mean just the regulations such as gdpr and ccpa and beyond these are table stakes i really mean what earlier in the panel that you hear lucy was saying it's really the values it's really about what is the right thing to do if we and indeed really truly believe the value of member first we truly believe the value of creating economic opportunity for everyone so and it all has to start with how we are preserving the privacy of our members as we are leveraging the data that the members in trust with us so i think everyone here probably have been trained over the years that do not give out sensitive informations such as your name address your social security number but did you know that 87 percent of people in the united states that are hacker can still reconstruct your identity based on purely the attributes of your data birth your gender and your zip code that's why exactly the traditional techniques such as obfuscation or kaon and immunity is no longer sufficient to protect and to defend our tax such as the difference in your tax or reconstruction or tax and this is why exactly we are investing into differential privacy differential privacy has really become this new standard when it comes to data privacy protection and at a very high level the concept is actually very simple right so you have a set of data what you can learn from the data should be the same with or without a single individual state out to be part of it so here in this graph that i'm showing you essentially the distribution here is what we can learn from this set of users and if i actually remove one users data from this set and the difference of them between the new distribution that i learned and the old distribution that i had should be very small this was a concept that was introduced by Cynthia dork at all back over almost 15 years ago and essentially the mathematical definition is to say that differential privacy is able to guarantee that the privacy loss that we have is bounded by this epsilon and with very high probability so at lending what we're working towards is really using differential privacy to be the default way that we're sharing data externally and the way that obviously we share data externally can be coming through various different data applications including the analytics dashboards the data APIs or even our ML models as well and it's a very challenging problem and we are still very early in our in our in our progress towards where we want it to be but we have made some good progress in particular on the global differential privacy model from i'm going to share a little bit of it both from the algorithm standpoint and also from the systems that we have in building so recently focusing on my team actually they were able to share some of the new development they have in the algorithm that they call top-key algorithm at the new europe's conference so for those of you who are really interested in getting to know more details certainly take a look at their paper but at a high level their paper is really trying to tackle putting differential privacy on this state of query that is extremely common in how usually companies share data externally queries such as can you give me the top 10 articles on lending that has the most comments so besides when we when we are building a differential privacy algorithm in production certainly there is a lot of practical constraints on those algorithms of not not the least that it has to be extremely performant low latency and everything but i think this set of query in particular relative to some other queries the challenge it has is this a subtlety of a single user can actually impact the ranking of multiple different articles right thinking about how you can both guarantee or meet the differential privacy guarantee at the same time that you are able to still return useful information and then and how can we do that so so please take a look at the paper in detail but actually achieving differential privacy the way that we're using it in practice we cannot just stop at designing new algorithms more importantly we also need to make sure that we are able to build systems that is able to interface between the data storage that we have and also the applications that we have that is the way that we can actually scale the adoption of differential privacy and making it so much easier for all the the applications that are leveraging data to be leveraging a different shortly probably way so we have also built the system which is really a differential privacy meteor that is able to speak between the data stores and the data applications that not only has a suite of dp algorithms to choose from but also at the same time has this very important component that we call a privacy budget management system which is able to keep track of how much privacy loss that is happening over time so that we can make sure that the privacy loss is not just sort of a one-time thing but over time we are able to guarantee what we we set out to be with our algorithms so again we also have another paper that right now is in archive I highly recommend everyone who is interested to read it as well so now coming to the second part of my topic obviously be responsible with data it doesn't mean just how we can preserve the privacy but also as we are leveraging the data that our members entrust us that we wanted to make sure that we are creating opportunities in a fair way as well so in the earlier panel there was a lot of discussions on algorithm bias right so I'm sure everyone here is very familiar with this and obviously really kudos to Joy Blamini who popularized or put this really in a spotlight over almost four years ago with her project coded gaze and really highlighted the fact that how the the top algorithms the facial recognition algorithms were actually really not having the right accuracy when we when thinking about detecting both light-skinned men versus dark-skinned women but fairness is not just about algorithms right earlier Lucy actually mentioned about how it's really about the various different values that we are creating to the to the society and and then if you're thinking about there's various different values that lending as a platform is providing to the society and there's many different ways right how we're building the product and your features that we are iterating and in various different ways and it's a very challenging problem I believe that everyone here would agree with me right there's multitude of views and the world outside lending may not be fair right even thinking about should we aim for equal treatment or equal outcome right judging by the fact that Margot earlier gave wooden stools of different heights I think she's a believer of equal outcome but there there is way more so just to give a example to see why this is such a challenging problem so if I ask everyone here to say hey if you have all the power to design a fair job product how would you design it talking about fairness by design and it's super challenging so you can think about hey I wanted to design a jobs product that is able to make sure for male and female that they are getting equally good job recommendations but that's definitely not we can't end there right because we have to also make sure that men and women are applying to jobs equally they're getting hired to jobs equally and more importantly they're getting hired equally to equally good jobs with equally good pay and now you're thinking about there's many different ways that we did a review not just jobs right many other ways we have to think through it's certainly a very challenging problems overall overall so I'm going to just share at a very high level the the three dimensions that we are trying to tackle or try to approach fairness using data so first of all we adopted the equal opportunity framework that was introduced by Hartz et al in their 2016 Europe's paper which is really saying that you know we believe the opportunities that individuals can get on LinkedIn should really be given the talent and effort should really be independent of many other attributes such as gender ethnicity and so on so forth and to put it in plain English so the fairness mission that we have is really try to enable that ensure that two people with equal talent should have an equal shot at opportunities and these opportunities are not just for jobs it's about opportunities to engage with various different contents opportunities to build network to find mentors get jobs learn get endorsements and so on so forth right so the second thing the second dimension that that we're looking at this is really about thinking about what are the things that is really helping individuals get opportunities there's been historically so many social research that has done that has helped us understand this way better right for example we know that what helps people getting opportunity is trust is a status is access to information so what are those specific assets that we have at LinkedIn that is able to help individuals to get that and one research that we did as an example is not surprisingly right social capital matters to career but in in the LinkedIn setting it's not just social capital as many how many connections you have but it's really about how diverse your connections are right so and obviously individual individuals who have a much more structurally diverse network they actually are more likely to be mobile in the labor market so last but not least we've got to take actions we've got to make changes right and then a lot of things that we make changes are starting with measurement and we measure from multiple different aspects and I'm going to go through some of the the examples in a bit more detail as example we we track understand the distributions of the way that people are getting value based on their particular attributes right students versus non-students as you can tell the students certainly are you know relative to non-students they actually have a very large network but if you're looking at the structural diversity of the network not very good right so then as we are building product we are constantly thinking about hey how can we help students such that we can bridge them into those clusters of opportunities who can help them really landing on those opportunities in their career and another example is thinking about how we really can present or ensure that everyone is visible to recruiters right we have this matrix that is called skillness at K which is really try to measure how representative is our top search results relative to the whole qualified candidates and sets and one thing that I didn't mention earlier which is really at LinkedIn obviously we are very experimentation driven so every single thing that we change on our products we really wanted to understand is this really bring benefits to the members or not right so we go through an experiment process and and we are able to actually since I'm running out of time that I'm going to jump really quick on this one which is really able to see whether the the what we bring the values that we're bringing to our members are they actually concentrated on a small set or are we actually sort of having an equal distribution across the board and we borrow some old economics concepts such as a consensus index for us to achieve that and such that we can detect unintended consequences in every product launch and thank you thank you thank you thanks for a wonderful presentation and sorry to do this to you you know to step up but I feel the power you know I come on the podium and people actually stop talking it's amazing how that works I want to do a quick shout out to two other uh with uh groups out there uh Leha hi and Roli so Leha and Roli it was a quick shout out to you guys girls watching uh our next speaker is Fanny Chevalier she is from Toronto by way of Bordeaux in France and so very international she studies in perfect subjective natures of human perceptions and I just wanted to tell you you can see her her bio of course in the program check out her website she has a list of fascinating really fun projects on the website like data inc and eco networks which I was looking at at four a.m. this morning when I couldn't sleep because I was so excited about today and I played with that so Fanny thanks so much for coming here and speaking to us Fanny Chevalier oops sorry thank you very much Margot for the for the very kind introduction thank you for the warm welcome I'm very excited to be here it's very refreshing to see so many women in in a single room have been outnumbered for most of my life I have four brothers I'm the only girl and I work with a lot of men because I'm in computer science so thank you for being here it's very it's I very much appreciate it and I'm excited to be part of this of this community so thanks for the opportunity and hopefully you will get to find my talk interesting so before you came here you made a lot of decisions so what to wear what to have for breakfast where to sit in the room and earlier in your life you also had to make bigger decisions such as what to study where to study whether or not to move with your partner which city to live in so what goes into making decisions and are we good at making the right decisions are you good at making the right decisions is something I want to talk about today so I live in Toronto and believe it or not it still snows in Toronto so I'm excited to be here also for the weather and one of the decision the big decision that I'm thinking right now is where to go next on vacation I can't wait and how about that of course we all understand what are the benefits that goes into like going in vacation in places like the Caribbean so or something like that I can't wait you know to have some relaxing time where it's warm and where there are like beaches and having a cocktail so we understand the benefits but what are the downsides setting time and money resources aside are there any downsides going on vacation in one of those places so I heard about accidents involving sharks so should I be concerned for my safety well let's look at the data actually I should not it's the likelihood that I'm being attacked by shark or a croc is actually very very low let alone be fatally attacked so I should not be constrained in contrast vending machines oh my god vending machines are responsible for more than 1700 injuries on a yearly basis in the US and vending machines alone are responsible for more than twice as many deaths as sharks and crocs combined so why is it that we are still like afraid about sharks and crocodiles and we can pass by vending machines like nothing happened this is because that's the way we reason so we are actually building on a lot of mental shortcuts or heuristics and these shortcuts actually guide a lot of our decisions one very powerful such shortcuts is affect fear and love are very powerful emotions that influence a lot if not dictate a lot of our decisions assumptions is also a mental shortcut that affects decisions I have in my head these these stereotypes of being in the caravans to be like an ideal situation for me but perhaps this is not going to be so relaxing maybe the music is going to be super loud with a lot of spring breakers around I don't I that might not be like the the real picture of the reality but my assumption is it's gonna be great and then we're also very good at drawing inferences or like making generalities based on our own personal experience of things so the last time I went to the beach with my partner he got badly stung by a stingray and this is very painful I can tell I don't know but it looked like so and the one thing that is weird is like for like all of my life until now I didn't think about stingrays but now if you get to see me at the beach you would find me like behaving weirdly like getting my feet like this so that I don't risk stepping on a stingray it's not reasonable it probably was you know an experience of a lifetime good experience so the fact that I act weird on the beach is not very you know serious it's not very you know doesn't have serious consequences but our individual behavior may those poor decisions may have serious consequences so one thing that I think where and if an individual decision put us to threats is like this incredible and reasonable pressure that some people have put on the on the face mask industry referring to the coronavirus outbreak so we know that wearing a mask if you're not infected going on the street is not going to help you you know not catch the virus yet because people have this assumption that a mask is going to protect them and because they fear the coronavirus their individual you know own personal decision was to go and get as many masks as possible but this is not going to help downstream because the people really need the masks are now going to have not as easy access to them as as it should so how can we combat those you know personal you know shortcuts and make better decision well it's not a surprise to you we're here for that we need to use data so the problem is that data can be extremely rich and extremely complex you all know that and this is an example of a patient clinical record of a patient and think about a physician before the encounter with a patient has to go through pages and pages and pages of clinical record and try to understand what was this patient illness trajectory so in my work I work in visualization research and I develop interactive visualization tools to help people make sense of real complex rich data and in this example my PhD student has developed a search interactive tool to help the physician you know have an overview of visual overview of the patient illness trajectory while still having easy quick access to elements of interest in the clinical record so it builds on machine learning it builds on natural language processing but the visual part and the interface is important because ultimately the physician is the one to make the decisions and we have other examples of such interactive tools here helping machine learning experts communicate with genetic researchers to understand what characterizes rare genetic diseases so you're looking at pretty pictures now but I doubt anybody in this room can make sense of those tools and it's normal if you're not an expert you know in either genetic diseases there's very little chances that that you make sense of these these tools have been designed for experts and they need to be trained to actually use these tools but data science is not always about you know rich complex data it's about complex problems but sometimes data is simple so how do we do when we have to communicate the data to the public so here's an example of a topic talking about risk assessment again gun violence in the US is a real concern so this chart shows you the number of murders committed using fire harms in the state of Florida and in 2005 if you didn't know there was a law that was enacted the ground stand your ground laws that basically tells that you can shoot somebody off any situation that feels threatening and you're not going to have any problems so looking at the chart after the law was passed is that true that the very fact that you can shoot somebody when you feel threatened like cause that people would not commit crimes in the first place because they were afraid of being shot this what this is what the the the chart suggests but is it the real story how many of you have actually paid attention to the y-axis it's going upside down so basically this chart should be turned around and put in the back in the chronological you know order and this is what uh usually everybody would have expected like we had the rise in crime I come in in death caused by firearms after this pass was this law was passed so it's not that the previous chart was lying to you because you had all of the information you could actually read the axis and process this information it's just that you have the expectation of the y-axis to go up right and even cognitively and and perceptually speaking this is hard to process a chart that has been turned upside down so even when we communicate the data as simple as a line chart can be it can be misleading so we need we worry about that when we communicate data to the public another example here that you are probably familiar with don't pay attention to the data but I'm showing the same data in two different ways so the difference here is the the scale of the y-axis so on the left hand you would see a close-up on the data and showing values only from 81 percent to 85 percent whereas on the other side you have the whole scale so which one is better for me to communicate the information one gives you the big picture and you see that the difference is not as much but the other gives you the details of the difference so this might have an impact on the way you make decisions based on the data that you see it's just a simple bar chart but just like my design decision might impact your decision later on so this is also a question that we are trying to address in my lab white shows we develop this just like simple animation can you switch back and forth but even the one first visualization that I present to you might influence the way you read the second one so visualizing data working with data making decision with data is difficult sometimes we don't make necessarily the good decision because the data might not be presented in a way that you expected it to be presented and then you miss read the graph and you're being misled and it is all about a question of perspectives the very first time like the at the very time you decide to project the data to make it visual you make decisions and the fact that you decide X and Y versus Y and X might change the perception of this data so I told you we should not rely on our mental shortcuts now I'm telling you let's not rely on visualization so are we doomed no no no no visualization is just super powerful but we can do better than what I just show you and I'm going to show you like three ways I believe we can we can try to communicate data better so first of all I would encourage everybody when you have to communicate data to the public to help people experience the data so that you make it more relatable to people let me explain what I mean this is about environment the fact is we have lost about 129 hectares of forest over a period of 25 years how big is that who has a good grasp about these numbers 25 years is like the time that my students in data science have been alive for it's a long time so do we understand really these numbers not quite so how about we do this this is equivalent to losing 20 soccer fields worth of forest every minute for 25 years this is better now we have units so that we can wrap our heads around yet we still don't have like a visceral experience of what that minute means when it comes to you know taking trees down of 20 soccer fields so let me try this how fast can you call our 20 soccer field within one minute and clearly you go past half way through but it's a fail point is bulldozers don't fail they get it every single minute 20 soccer fields worth of trees down every single minute so now by engaging in this activity of creating this experience of what a minute is worth and what those 20 soccer fields are worth by the act of coloring you create an experience and you have a visceral experience of of what these measures these numbers mean so this example comes from a very nice book that has other you know it's not a relaxing coloring book at all but it's a it's a very interesting one I love this project it's it's fascinating to engage people through coloring in in like actual data that we should all care about and you don't have to actually you know um oh sorry let's let's explain some more data who here drinks pop oh okay is that a good decision let's look at the data so there are 39 grams of sugar in a can of pop but how much is that should you be concerned this is again the value this is a very mundane you know quantity that we have to work with on a daily basis but how good of a graph do we have with this with this unit not much so how about I tell you it's worth 10 sugar cubes I doubt that I I can find anybody in this room who can picture themselves put 10 sugar cubes in a mug even if you know we have like big murk mugs in in North America this is a lot this is a lot but the very fact that I re-express this abstract unit using a sugar cube that you have experience with now my message gets across much more clearly because you have experience with that and that's getting us back to like our personal experiences how we can erase those mental shortcuts by creating experiences with data that people can have a memory of because they have experienced it this early so make your data relatable and engage more with your data befriend your data we have a talk this afternoon that is going to cover that the personal data of how you can learn about yourself with by engaging more with your data it is very important so when it comes to making individual decisions getting back to the coronavirus outbreak I think I do make I do have like the right habits of washing my hands and so on but do I so I asked myself the question and I started to collect the data about you know how many times I put myself at risk as in somebody cafes around me or I touch a handle or like an elevator button and how many times I touch my face because that's where we get the germs from if we touch our face after getting handles shaking hands and so on that's where we get the disease from and so I also recorded the context in which this happened and I plotted this data by drawing you know and that's my pattern throughout the day and if we can see that in some occasion I do things right I touch my face but I had washed my hands there are occasions where I did not and so that's where I can realize you know things about myself we believe we do things but sometimes we don't do exactly as we believe we do and by engaging in collecting this data about yourself you can learn not only about yourself but also what goes into making decisions about what data to collect when and how what goes into making a visualization I drew I made decision design decision to make this visualization might not be the prettiest but you know there was some design decision going into that and by engaging with that you can get a better understanding of the whole data science process because you apply it to yourself with small data and your own data that you know matters to you so this idea is not new it's been put forward by fantastic artist Stefanie Pozavec in Georgia UP I encourage you to to check out their book Dear Data they've done that for several years but I've done that manually by drawing my hand but you don't have to you can use tools like data ink that we have developed in in collaboration with Microsoft research where you can have a system that helps you with this process of creating those personal visualizations instead of drawing one drop to show each of the sentiments you've collected about throughout the week you have like a replication mechanism you have a vector graphics that knows what color to map to what data because you specify what are the mappings and if you're not a drawer you can use other tool note that it's an old female team kudos you can actually use tools also to steal visual elements from photographs and bind these elements to data to make personal visualization that looks playful and that looks interesting visually so engaging with data is important data literacy is a skill that we need to learn and that we need to teach which brings me to my last point it is our mission to teach data to the next generation to come we make a lot more data driven decisions as we go as we we're being strong data on a daily basis and in my lab we develop tools to help educators teach data visualization and have children engage with with data and learn what goes into you know collecting data and visualizing data and but it is important as mothers as big sisters as aunties that we engage you know children surrounding us in in doing the same and engaging with data because that's through this means through education that we're going to combat all of those like you know irrational decisions that we can make so to wrap up yeah we have a lot of data that things being thrown at us it can be complex but it can be mundane like the sugar cubes by re-expressing data to make it relatable by engaging with data and by teaching data we can make a better society and hopefully make better decisions with data ultimately with that I would like to thank you for your attention thank you so much it's amazing thanks so much Fanny wonderful talk so it's my pleasure now to welcome Karen Mathes and Carol Lynn Bauhaus to the stage we're going to tell you about a new outreach program great thank you Margo well we're thrilled to be here today to share with you the very first steps in our new WIDS high school outreach program let's see if we can there it is there we go yeah great well we've heard from many of you over the last few years in the WIDS community worldwide that it would be great if we all work together to build a much bigger pipeline of girls who one day will become the next generation of amazing data science leaders so that's wonderful it really fits with the WIDS goals to inspire to educate and support women at all stages unfortunately the studies we've seen show that many girls particularly around the age of 15 start to lose interest in STEM fields and we want to change that by starting with exposing girls to excellent role models that they can really relate to young women not much older than themselves and we want to show that there are many ways to get involved in data science it's very non-linear so you can study math statistics computer science economics bioinformatics social sciences really we want to inspire teenagers that they can take data science combined with any area that they're really passionate about and over time one day have a cool and you know rewarding career so we started by creating some specific materials videos and other materials designed just at high schoolers and we are excited to show you a very first look at one of those videos just a quick clip here data science can be found every year shaking up almost every aspect of our daily lives but most people really hadn't heard of it until about 10 years ago it's one of the fastest growing fields today so there's a ton of opportunities for everyone to be involved anyways what is data science so this is just a teaser other things that we're working on are additional videos discussion guides to go with those videos and then a student glossary that includes things like algorithm and machine learning another aspect that we're really excited about is a day in the life reel that features four amazing women going through their work day here's a sneak peek of that one much more meaningful so we're currently finishing these materials and we've just started a pilot in several local high schools the early feedback is really really exciting really positive with this initial video we're also beginning translation into Spanish and Chinese and hope to add additional languages soon we'll be taking that feedback from the pilot to revise and improve the materials and releasing those to educators and teachers in the fall so please if you're interested in hearing more about it and getting the videos and materials as soon as they're ready sign up on this bit.ly URL or you can go to the WIDS website and sign up there and we'll have more information coming soon and we definitely want to give a shout out to the design team if you're here in the audience please stand up because really it was a great multi-disciplinary group two math teachers two students one in high school one in undergrad two industry professionals and then three of us from Stanford on stage and we'd love to talk to you more throughout the day today so come find us at a break you're done you're done really exciting looking forward to working with you all thanks so much you and and Caroline especially shout out to you because you've been leading us all through this thank you it's a great great team outreach program yeah right thank you very much so uh for many people that are now uh following us live also this is the moment they've been waiting for and this is why they're joining us on the live stream is meridethly talking about the datathon and announcing the winners of the fourth third third so i can count one two three four yeah the third which datathon thanks so much margo hello everyone i'm so excited today to share some results from our third annual wids datathon you may recall that the first datathon we hosted focused on financial inclusion data the second datathon was using high resolution satellite image data and this year let me tell you about this year we focused on health data particularly in a collaboration with hospital intensive care units and with 130 thousand patient visits 160 plus columns and four times the number of teams participating compared to last year we quadrupled yes that is worth a round of applause we were so excited that 80 percent of those registered for the datathon this year were women in an area of online predictive analytics that typically has estimates of about 20 so good job on flipping that statistic the 951 teams of up to four people spanned 85 different countries so that's so fantastic to see and over the last six weeks these teams work together around the clock and submitted more than 12600 submissions this is all possible because of months of work with our datathon team spanning a number of different organizations from the private sector and the public sector and i have to give special thanks to marzia gasimi from the university of toronto for alerting us to this opportunity with this data set to our colleague latani sweeney from the harvard data privacy lab and to our collaborators at the mit global open source severity of illness score or gosas initiative as well as all of these organizations you see here if i could ask the folks in the room who worked on the datathon team to stand up we'd love to give you a round of applause and thank you so much for your leadership we also have to give a special shout out to the dozens of wids ambassadors around the globe who hosted more than 20 different workshops to help team formation and to facilitate hands-on training throughout the last couple of months here are just a few images and there's hours of content now forevermore there online for viewing thank you so much to the wids ambassadors and now if we could get a drum roll please for our top three winners in first place we have team women power from israel they did this with their own laptops nights and weekends we'll be sharing more stories uh over the next few weeks in second place we have team nullset from ukraine and in third place team provision dot i o from france congratulations these teams have already started sharing uh their problem solving approach and so fantastic to see and we're super excited to announce for the first time an extension a second phase of the datathon with an excellence in research award with the national science foundation big data innovation hubs so we invite you all whether you're working on a paper this month or not to join our webinar this this thursday and we hope to see you next year congrats thank you so much meredith and a big shout out to her i mean it's amazing what she and her team have done and how the datathon is just increasing every year it's very exciting so we'll continue with our technical vision talks and i'm really excited to announce uh rama akiruya akirayu i write you i'm so sorry you know when i'm bad at pronouncing all the people's names and and people get upset with me i always tell them though try mine because my name you would have to say marco herzen with so now when i was looking at rama um she had a quote online that i just thought was wonderful she works at this interface of language and ai and she said in one of the interviews that i saw if data is the new oil and ai is the new electricity then speech is to switch to turn it on so with that thank you so much for coming and talking to us today my pleasure hi everybody so my last name is spelled as akirayu by show of hands how many of you can speak three or more languages in the room oh wow okay that's quite a few more than what i expected but that's great um language is hard right um mastering a language is pretty hard especially if it's a teen language as a mother of a teenager i know all about it i go what that word means that these days um now imagine we have to teach that language to ai machines and not only the language of the humans but the language of the the enterprises so today i'll talk more about what it takes to teach ai systems the language of the humans and as well as the language of the enterprise so polyglot enterprise ai is the name is the title of the talk what is a polyglot a polyglot is a person who speaks multiple languages um so we are here talking about ai systems speaking multiple languages so here are the key takeaways um in case you miss the rest of the talk to build polyglot enterprise ai ai needs to be able to master not only the language of the humans but the language of the enterprise and i'll show that to you with more examples and the second one you don't have to perfect the language of the humans in order to really begin to address the language of the enterprise there is no shame in really narrowing the problem and solving the narrow ai first as opposed to trying to attempt to solve the broad ai and the third point is that depending on the availability of the amount of labeled data and the transparency and and the such requirements that you have you may have different options at your disposal and i would like to take you through some of those what those options are today so let's look at the language of the humans how many languages are there in the world um so i'll give the answer about 6500 to 7000 plus languages apparently and 23 are the most spoken languages in the world and typically businesses do language companies do business in about 170 plus languages so this starts to give the scale of the problem 170 plus languages is what companies have to master uh ai systems have to master in order for their systems to be really available in all those languages so let's take a look at the complexities of understanding human language i took one nlp task which is the machine translation and um i wanted to demonstrate what it means to really understand human language you know take some idioms humor and sarcasm and throw it in in any favorite translator machine translator program out there and you'll be surprised and i'll pick one i picked telugu and hindi languages um because those are the three languages i speak in two languages i speak in addition to english so i'll i'll take it through this one example hit a roadblock it's an idiom um the the translation is it's pretty funny for those of you who speak hindi audience uh in the audience it's it's utterly ridiculous translation selling point take that for example vikraisthal it's literally translating to the place where things are sold as opposed to really understanding the meaning of the the expression selling point yeah so language is hard and throw you know social media and lingo and other things that keep changing every day it's pretty hard to keep up with it for ai systems now we have to add 170 plus of these languages so uh let's take a look at how we are doing actually when we look at um uh the current progress in the in the space of nlp 2019 i would say is a defining moment if you look at some of the the the leaderboard tasks on nlp um i'm showing here some of the basic tasks and it appears that computers are actually surpassing humans um in some of the basic nlp tasks i won't go through all of these but um you know the the black line is the human and blue score is the one that's primarily used the dotted blue line and there are many tasks that are out there on leaderboards with different data sets and people are attempting um so many different uh algorithmic techniques to to to attempt at those different tasks and they seem to be you know doing pretty well on those does that mean that we are actually mastering human language how close are we to really understanding human language or how close are the ai systems to understanding human language first of all we should understand that blue score is a very poor representation of of all of the the human language understanding it's it's one representation one measurement metric that has been chosen by the nlp community but they themselves would admit that that's not a very good score it's only limited to actually classification tasks um it doesn't really represent some of the complexities of real human language the questions question answering pairs of sorts of data sets that are given are very um simple and they don't represent the business world also and mostly these competitions and leaderboard things are limited to english they don't scale all that well to other languages so that's that that's about the language of the humans now language of the enterprise let's look at what does it mean um companies are applying ai to various different types of domains right to for building chatbots for building doctor's assistants for in legal domain health care domain uh financial domains and so on so many different domains and they all have a vocabulary of their own we don't encounter as many of these vocabularies in regular human language so let's take this example i don't want you to read all of this just focus on some of these yellow highlighted words the one on the left is um the doctor's note um uh on a patient and you see words like meloxicom lower sternum troponin um acute coronary syndrome alternative colitis and so on how often do you encounter that in regular human language and the one on the right hand side is a corporate contract and here you know there are a lot of things about who is what to whom by when uh sellers buyers their rights and and obligations and so on if we throw in a regular nlp task that masters one of the question answering systems at one either of these two examples they'll fall flat on their face no hope no chance because they don't understand the language of the enterprise at all but if you look at the language of the enterprise it's very very hard it's complex there is not that much labeled data that's available um and there are a lot of privacy restrictions that really prevent data from going outside to really leverage crowds at large to get the data labeled um of course they require multilingual support there is a lot of special vocabulary subject matter expertise is required um and their subject matter experts are not programmers therefore they can't really just simply go in and and write some you know additional expressions and such to to add these vocabulary into the system so enterprise data is complex and it's it's tucked away within enterprises and it brings its own sets of challenges so when I say polyglot AI I don't necessarily just mean multiple languages as in the language that we speak in 170 plus it's the language of these of the enterprises like these kind of things of the contract understanding of understanding the vocabulary of healthcare domain of retail domain financial domain and so on so there there comes the first key takeaway right it's polyglot enterprise AI especially needs to be able to speak not only the language of the human but the language of the enterprise so understanding human language is hard we said that but understanding enterprise AI making AI speak enterprise language is even harder so but how do we make progress are we are we at logjam are we stuck well here is where you know there is an insight that came to us as we were working on this how about you know instead of trying to really address the broad AI in the case of NLP it's about really um these systems being able to do Turing machine type of tasks right being able to really you know do the broad AI but how about you know we don't necessarily have to solve all of that how about we narrow the domain and really be able to address the specific problem at hand and address that and as long as we are able to excel in that domain it's fine even if you don't solve the broad AI right so narrow the problem down and it may not have to be you know really passing the Turing test it could just be does it understand on top of basic human understanding that it has if it's able to pick up enough domain expertise for a particular industry maybe that's just fine that's the narrow AI and you take it to a particular enterprise and actually deploy it there and let it further customize and learn from that particular enterprises language for that particular company it gets even better and that's absolutely fine to address that narrow AI there is no shame in in actually solving a real problem instead of trying to attempt the full broad AI and and and if you look at the learning curves you can actually start to ride um these learning curves um by reducing the amount of training data labeled data that you need because you're narrowing the problem so you can accelerate learning faster so now let's take a look at the same two examples that I gave earlier um if we just applied the base human language understanding speech um system um and gave an audio spoken by a doctor in the first iteration it made the following mistakes where it said lower sternum it ended up um transcribing it as lower stardom uh where was supposed to be troponin it ended up saying proponent and for acute coronary syndrome it heard it as good corn a r syndrome for ulcerative colitis it said a full sort of colitis so few mistakes it made because it didn't have full understanding of that particular domain all we had to do was to take that piece of text actual with the correct transcription just that piece of text and customize our speech machine and rerun it and there you go all those mistakes that were in red now turned green and the system was able to pick it up so this is the power of really customizing and narrowing the domain all you had to do was you know have a base layer that's good enough and then start to customize it to a specific domain so again i'm not giving specific uh uh experimental results here which i will in just a bit but intuitively um what this told us that you don't have to really perfect the language of the humans in order to really attempt to solve the problem of the enterprise ai you could actually begin to do it by narrowing the domain and solve the narrow ai and you would get pretty good results and that are practical enough for you to use use in enterprises so now i want to talk about um what does it really take to to build these polyglot systems in nlp for natural language understanding specifically there are three different techniques broadly speaking you could say there is statistical approach there is the rule based approach and then there is a third one which is an interesting one a human co-creation approach and i'll take you through what these are statistical nlp is pretty much machine learning everybody talks about it um it has a lot it takes a lot of labeled data and uh processes the data learns the parents and produces prediction models right so in a conventional approach if you were to build a polyglot ai model what we would do is we would be taking english data labeled data to build an english model we'll be taking german data to build a german model we'll be taking chinese data labeled data to build a chinese level model this is how we were doing it pre uh birth right uh the universal language model of course that's very inefficient highly uh not cost effective um cumbersome so many models to manage and maintain thank god we have birth now um we have this multi-lingual model we can train it with massive amounts of wikipedia data now we have one universal model throw in all kinds of labeled data into it outcomes one cross-lingual model that we can use and deploy one model to manage and maintain right so the broad ai solution for it is throw in wikipedia data because it's parallel data in different languages you throw in and the system learns the the word embeddings and the the associations between words in different languages and it's able to make um this cross-lingual model that's the broad ai now what we want to be able to do is to take that pre-trained birth and actually on on the top layer really start to now add this additional data think about this as the narrowing of the customization that i was talking about you know and that in the triangle so the top layer is now a fast forward neural network a feed forward neural network and with softmax and we start to fill in the specific uh labeled data of a particular enterprise or a particular industry on top of the pre uh trained birth to now start to customize it further so here is an example you could take english um insurance companies cost for customer support domain very specifically and start to feed it in through um on top of the the pre-trained birth and throw a throw in different languages say german companies customer support insurance domains data same thing for chinese and so on um on top of the pre-trained birth now we get one cross-lingual model for insurance industry and for customer support use case specific right very specific i want to be really specific because that's the problem that i want to solve i don't particularly care to solve turing machine problem here now we could go one step further and say okay there's a particular company called abc insurance company and they want to deploy their customer support chatbot now i want to further customize this model for them so now again the same idea take the universal nlu model you have the insurance industry specific model we have on top of it throwing this additional very specific company specific data uh the softmax layer on top now you get another model out and this is the statistical nlb so we applied this to our sentiment model uh just to test it out to see how it works and i'll i'll just present these examples so we threw in english spanish italian resilient portuguese and so on eight eight different languages labeled data and the first column in the table talks about the accuracy when you build it with with one single model that is take english data build an english model take japanese data take build japanese model not using burt right so that's the accuracy whereas the the the third column on there shows really with one the burt model language independent model um what happens if we if we if we try to make a sentiment prediction across these three labels positive negative neutral um the accuracy numbers so obviously this is showing us that uh it's pretty encouraging we're pretty good um you know our accuracy is on par are better so that's great burt model is working now we wanted to see what can i do zero short learning that is um without actually feeding any labeled data in a particular language can i actually make the model predict in the sentiment in that language now that's an interesting thought how could you actually make predictions without giving any label data whatsoever in a machine learning model well that's the power of uh you know burt and statistical uh model here with where you know you have word embeddings from multiple languages that are pre-trained um it so turns out that um when we try to make a prediction in french with zero short learning we are not bad at all about 57 percent as compared to the 54 percent when you actually train a french model with labeled data a single uh model without using burt at all and on top of it if you add some labeled data it increases further to 64 percent so that's pretty encouraging to start with of course if you add more labeled data and if it sees some more instances um you'd probably get better uh even for that so what's the problem why can't we just use this well the problem is statistical nlp requires large amounts of labeled data it's impractical in many many domains like the contract understanding domain that i had just shown and it's not very transparent and it's not very explainable right so the example on the left hand side may may be addressed with statistical nlp but the example on the right hand side cannot be because you you cannot get so many contracts from a company be labeled by experts to tell you exactly what who is what to whom um so that the system gets enough understanding to learn parents it's too much to ask for so that's where rule-based nlp comes handy right rule-based nlp is talking all about these different kinds of uh nlp stack tasks that you have um already uh you know like sentence segmentation tokenization part of speech tagging lemmatization keep on building on that stack and right addition right have a subject matter expert use a rule language to write rules um so you are actually able to tell the system that you know this verb when this is followed by this kind of an entity um this is what it means and it's considered to be an obligation in a particular domain and so on um and you go further and and use semantic role labeling you know to really identify who is what to whom or who did what to whom and where um and use preposition bank uh from banks from different languages and you know start to build out this you know cross-lingual or multilingual information extraction um and apply that uh to to specific uh domains where you don't have too much label data so a rule that's built with an an SQL like query language that was developed at IBM research called annotation query language would look like that on the top right corner and if you take the text on the left hand side and run it through that annotation query language you would start to actually derive things like what you see you know what's an obligation who's the purchaser whether it's a purchase or not and so on um so you this is what is the rule based one giving however these are complex to write right um who would have we have to really train people to write that SQL type of languages so there are disadvantages to this approach too in the sense that it's helpful useful to do um large amounts of training data uh with small amounts of training data but you cannot really um um you cannot really have a subject matter experts to write those rules um so the third approach that we have come up with is um human rule co-creation basically use deep learning algorithms machine learning statistical nlp to train rules and expose those rules to humans so that they don't have to have the burden of writing the rules but the rules are written a priori that they can then correct that's the human co-creation rule learning approach and uh when we did tests with that actually it it does pretty well when humans are refining the rules that are already done by uh machine learning models so to summarize I would say you know I've presented three different approaches each of them have different pros and cons on their own um but uh you know you have to really pick and choose based on your needs so uh to sum up you know the key takeaways for building polyglot enterprise ai you have to really make sure that the systems understand not only the language of the humans but the language of the enterprise um also there's no shame in solving the narrow ai uh as opposed to trying to do the broad ai and also depending on your availability of labeled data and you know what are the transparency requirements that you have and such you may have different approaches at your disposal like the three that I mentioned um so you may want to consider those things as you are building polyglot enterprise ai systems thanks very much Rama wonderful thanks great that's great well we have one more talk left before lunch and I'm very excited about uh Tlithia Williams joining us here on stage today as well you may know her from her book on power in numbers about rebel women of math you may also know her from from nova wonders she's a very talented woman and I'm so glad you could make it here today at wins Tlithia thank you so much I'm super excited to be here um I'm even more excited where donnelly where are you because I want to get my picture like I've been looking I'm like how do I want maybe like a put is that it yeah okay all right awesome um I want to talk to you a bit about owning your body's data I am the only thing that's keeping you from lunch so yeah I get to draw this out no I'm joking I won't draw it out at all um this talk is sort of built from uh the TED talk uh that I've given don't those are my mom my mom watches it she finds a different computer at the library just to show her support um but talking about ways that we can use uh data that we collect from our body to make decisions about our health and so that's kind of where I'm going to take you today this was an email that I got from Fiona um she says dear Tlithia I'm ashamed to say that at 33 I've yet to learn uh myself how my body works my interest in this subject was peaked after I was diagnosed with polycystic ovarian syndrome probably some of us may suffer from that as well in all of his sexy symptoms glory unfortunately dealing with doctors was an exercise in frustration she says this prompted me to start researching how I could take control understand and aim for healing versus symptom management somewhere on all of this I came across your TED talk and my experience far too few women are aware of the reasons why their body acts the way it does me included and reading hard reading hard data is either too boring or flies over one's head not for any of us in this room but okay all right Fiona your example was simple yet beautiful and enlightened me to the why behind one nuance of my amazing being so thank you um it was really special for me to get this email from her because it really sort of validated the need to help people think about the data that they produce from their own body and how they can take ownership of that and how they can use that in decision making so question for you what kind of data do you collect about your body or what types of data do people collect about your body let me see your hands get the blood flowing yes right here you got to yell it out your monthly flow yes don't we do that guys in the room yes sleep data yes what else two hands back there go ahead blood pressure body temperature yes yes go right ahead I'm saying exercise data weight yeah she said yours we're like yeah who really takes their weight like no I don't want to see it really I can tell when it won't button I'm like I know I know it's changing so we collect data with all types of devices how many of you have on a device right now that is collecting data about your body yes how many of you use any of that data that gets collected oh oh yeah you should you should feel ashamed yeah you should we're going to talk about how you can do that today next question why is it important to understand your body's data yes change tells you that that's right because if it's not changing guess what you're dead so yeah right change tell you I'm glad you like that yeah what else why is it important yeah keep track of progress that weight loss progress that we're all making progress on why else is it important yeah exactly you can help others right when my father-in-law got sick I remember taking his data to the doctor because you know he dad what he didn't he was 86 he didn't look at it but I started to see how his health was actually improving as his diet change absolutely right lots of things that we can do when we collect that data so let's talk about how we can use our personal data to sort of create a digital footprint right that's kind of where we're going to go with this talk sleep data you mentioned it I think this is my husband's one evening falling asleep at 9 35 p.m. look there no time to fall asleep just just knocked out really just did he wake up no no was he restless six times or so he was in the bed for almost seven hours he was actually asleep for almost seven hours yours truly on the other hand I went to sleep earlier believe it or not and what happened woke up twice maybe some kid was breastfeeding somebody had to go pee I don't know it was twice restless 16 but having this data is beneficial because now I can understand why I I was in the bed so long I don't understand why I wake up tired oh it's because I woke up literally you know 18 times last night for different reasons my body was restless right so information that we get just by looking at your actual sleep how many of you actually look at your sleep data okay or no your sleep patterns absolutely my family is really competitive and even with the boys with we have three boys and we each had Fitbits and we would do these family competitions to make sure that people stayed active and so here's a snapshot of one of our competitions you see Donald if he probably didn't win I'm sure I came back that week but at the top me there's our son Josiah Noah right we're all a part of this competition but what it does show us is our activity during the day so this particular day you can see that I wasn't really getting up every hour right there are hours that pass without me moving and it makes me conscious of that time that I'm not active it also helps us to raise kids that are thoughtful about their activity during the day right so they're getting up and moving around because their Fitbit is buzzing that they've been seated for 50 minutes and you want to get up and make this time and so this is a way that we make make sort of this a fun activity in our family right by making it competitive but it's also helping us to improve our health and to keep track of this data I love that you can look at the aggregate of the data this was an interesting goal the goal was four million steps in a year and notice that midway through September was close to three million and so it was like oh I wonder if I can push it to get to a four million step goal who knows if they've done four million steps in a year so having these goals for our family actually improves our activity and improves our overall health because we're able to see the data we're also able to see like where are these peaks happening where are we having really good days where we're getting 15 000 steps what am I doing on that day and how can I make more of those days happen the other thing that my family likes to do is work out believe it or not this is a picture my husband teaches spin and this was on Thanksgiving day because you know I mean if you're going to eat poorly I feel like if you work out in the morning like it justifies you know what's going to happen later so we actually went and did spin on Thanksgiving morning we try to be a really active family my husband also plays racquet ball and he's sort of that type a personality that writes everything down he's very meticulous and detail oriented that's probably many of you in the room he keeps track of his racquet ball data this is by his heart rate so here you're looking at different heart rate intensity zones so gray is the lowest zone and then it goes to sort of a charcoal and blue and then red is really high intensity followed by yellow and then green notice that he has notes in the corner right so he goes into the app to put in notes to correspond to his data I know yeah I married up here's his training data he said racquet ball singles with Zachary won four of four games not he's so modest really and in fact you can see where his heart rate intensity was picking up right just looking at his heart rate trace right he's he's playing four games he's going and going and going he's got some pushes at the end great you know so the statistician in me is love and looking at his data and so I'm skimming through and I'm like oh honey wait a minute couple this is not a week later week later you're playing with Jimmy you won one and you lost two oh gosh what what happened and so I wanted to uncover what was different about that day I said well when I look at your data you've got these peaks of where you're playing but then you kind of have this lull in between like I didn't see that with your your previous data what was happening what were you guys doing I mean Jimmy's a different person so yeah there's some you know difference but this data looks different from when you were winning and he said oh yeah when I played Jimmy after each round we'd go outside we have a sip of water I'd sit down I'd rest my heart rate you know I'd cool down a little bit and then I'd go back in I was like and you went back in and lost right you got to stay in there and keep it going because Jimmy's beating up on you while you're chilling drinking water right and so it was interesting to sort of see the difference in the data and not just for Donald right this works for competitive sports teams football teams right basketball teams at our institution how can we help our students see right the difference in their data and whether or not they win or they lose right how might that influence what happens last thing I want to share what else we can learn from our heart data is how how our body is changing on the inside so early on in our marriage my husband suffers from allergies and we hit we were newlyweds and one night he couldn't sleep and and he's up and he's like honey I can't breathe and I'm like can you can you breathe out of your mouth you know and he's like yeah but I can't breathe out of my nose it's my nose you know and I'm like well just like your body your body's not going to let you not breathe out of your mouth like let's just see about this in the morning and he's like no and I think we need to go to the ER and so I'm like fine you know I I don't want you to die on my watch this is just the first month your mom you know your parents are going to think I did it and so um and so I get up and I drive him to the ER I kid you not that was my thought it wasn't his health it was like people are going to blame me and my cooking I got let me just take you to the doctor so we get to the ER and um we walk in and I say you know hi my husband's having trouble breathing you know and I'm like out of his nose but the mouth is working fine you know um and the doctor sort of looks him up and down and he's like let's take you to the back we're going to run an EKG and a do it a cascade and we're gonna like whoa wait wait no no he's it's allergies I'm sure it's allergies and so we get rushed to the back and we're telling the doctor what's happening and I'm like you guys are over reacting he just he took this medication and he took some affrin and then he did a nasal decongestion and and the doctor was like we're just gonna you know we got this we might have to go to surgery and I was just like time out like what if am I really killing him and so and so we're there and then so so my husband's getting a little frustrated because he's sort of like well I didn't know it was was like that bad you know and I'm like well you you ride your bike like how can you know how can your heart not be that good you ride your bike all the time and so anyway this doctor his shift was ending so a new person on call came in this is about 4 4 30 a.m. and we're we're ragged because we've sort of been you know on this roller coaster of what's happening to his health and this new doctor he comes in and he says tell me you know what's up I'm looking I'm looking at why you guys are here we think he's having a heart attack and I said well I don't think that's it he he he bikes a lot and and you know and then we started walking him through the evening like we got home and he took this medication he took some affrin he took a decongestion and he said oh you never want to mix those two because they clog your nasal passages let me give you this instead and we were like but like well we told the other doctor that too and he thought it was a heart attack and you're saying you don't want to mix these two because it did exactly what it did for my husband it clogged his nasal passages and brought us to the ER and he said well when a 300 pound man walks in the ER and says he can't breathe you assume he's having a heart attack and you ask questions later and so part of me was like well I I mean I I guess but he's not having a heart attack and you know so this is my husband's actual heart rate heart trace data from October of 2010 through July of 2012 he started in the hypertension pre-hypertension zone of his heart rate data and over the course of a year and a half or so got down to a very healthy normal zone and what was interesting to me was that this data of his heart health was really describing a process that was happening on the outside right we we moved to California that in of itself that'll make you vegan like as soon as you you just we were in Texas where no one is vegan no one's vegetarian I'm you know sorry for those of you watching from Texas yay Texas um so we moved from Texas to California and we were like oh my gosh he's farmers market like is this what a strawberry tastes like are you kidding me um and so he lost over 150 pounds and our kids have also been the beneficiary of that we've got gardens and we get um we're now we've now actually become vegan like we're one of some of those people um but it's really changed the family dynamic and led to a healthy family lifestyle because we were able to look at this data all right and see how the data was changing us and how we could also work to change that data so three points I want to leave you with four points I want to leave you with today kind of my take home message uh many of you as you raise your hand you're already taking these daily measurements about your body but I want to challenge you to try to understand those and how can you uh infer your health from that data right so what is it that you can do now that you know that you only took 3832 steps yesterday how is that going to change your behavior so how are we going to let data change our behavior share our data with the doctor and with our loved ones and also get them to share their data with us right we're data-centric folks in this room we're not afraid of the data we may see patterns that our family doesn't see we may see health challenges that are coming on that we pick up just by looking at their data so how can we um be a conduit for helping them understand their health and then I love this last part thinking about how as a professor I'm always thinking about how to broaden participation in statistics and data science um what are gateways to the field outside of calculus calculus is a high hurdle to get folks into data science and statistics what are other ways to enter our profession and students are big on data especially personal data especially if it has something to do with maybe sports data so how might we think about courses that might be a gateway to the major outside of sort of the traditional very heavy math calculus based track because I think that would help encourage students to pursue data science thank you so much for your time today thank you so much well that was amazing I and it's good it's lunch time because now you can do uh go outside and and step around a little bit and and let us know how many steps you took uh hey before we we leave for uh for break and and I tell you about the breakout sessions a shout out to Purdue and a shout out to WPI they're closing down pretty soon at the east coast and but they've been with us at WPI they've been with us most of the day um a few things about the the lunch lunch is available outside for you many of you have signed up for break breakout sessions if you do not know where you need to go just go outside these doors here and there will be people with signs that will take you walk you over two rooms especially those outside in seaper if you have not signed up for a breakout session yet uh I was just thinking about deep learning here in this room in McCall uh you could join that or you could join for example data ethics in in the seaper building and again just follow the students with the signs who are ready for you outside uh other than that uh during lunch time we'll also be live streaming with interviews by the cube and we'll see you back here at two o'clock I hope you have a wonderful lunch you can connect enjoy the california sunshine and i'll see you this afternoon from stanford university it's the cube covering stanford women in data science 2020 brought to you by silicon angle media hi and welcome to the cube I'm your host Sonia Tagare and we're live at stanford university covering wids women in data science conference the fifth annual one and joining us today is Daphne Kohler who is the co-founder who was sorry is the CEO and founder of incetro that Daphne welcome to the cube nice to be here sonia thank you for having me so tell us a little bit about incetro how you how you got founded and more about your bill so I've been working in the intersection of machine learning and biology and health for quite a while and um it was always a bit of a an interesting journey in that the data sets were quite small and limited we're now in a different world where there's tools that are allowing us to create massive biological data sets that I think can help us solve really significant societal problems and uh one of those problems that I think is really important is drug discovery and development where despite many important advancements the costs just keep going up and up and up and the question is can we use machine learning to to solve that problem better and you talk about this more in your keynote so give us a few highlights of what you talked about so in the last um you can think of drug discovering development in the last 50 to 70 years as being a bit of a glass half full glass half empty the glass half full is the fact that there's diseases that used to be a death sentence or a sentence still a lifelong of pain and suffering that are now addressed by some of the modern-day medicines and I think that's absolutely amazing the other side of it is that the cost of developing new drugs has been growing exponentially in what's come to be known as earworms law being the inverse of Moore's law which is the one we're all familiar with because the number of drugs approved per billion US dollars just keeps going down exponentially so the question is can we change that curve and you talk in your keynote about the interdisciplinary culture so tell us more about that I think in order to address some of the critical problems that we're facing one needs to really build a culture of people who work together at from different disciplines each bringing their own insights and their own ideas into the mix so and in Citro we actually have a company that's half life scientists many of whom are producing data for the purpose of driving machine learning models and the other half are machine learning people and data scientists who are working on those but it's not a handoff where one group produces the data and the other one consumes and interprets it but really they start from the very beginning to understand what are the problems that one could solve together how do you design the experiment how do you build the model and how do you derive insights from that that can help us make better medicines for people and I also wanted to ask you that you co-founded Coursera so tell us a little bit more about that platform so I founded Coursera as a result of work that I've been doing at Stanford working on how technology can make education better and more accessible this was a project that I did here a number of my colleagues as well and at some point in the fall of 2011 there was an experiment of let's take some of the content that we've been we've been developing within it's within Stanford and put it out there for people to just benefit from and we didn't know what would happen would it be a few thousand people but within a matter of weeks with minimal advertising other than one New York Times article that went viral we had a hundred thousand people in each of those courses and that was a moment in time where you know we looked at it at this and said can we just go back to writing more papers or is there an incredible opportunity to transform access to education to people all over the world and so I ended up taking a what was supposed to be a two-year leave of absence from Stanford to go and co-found Coursera and I thought I'd go back after two years but the but at the end of that two-year period the there was just so much more to be done and so much more impact that we could bring to people all over the world people of both genders people of the different social economic status every single country around the world we I just felt like this was something that I couldn't not do and how did you why did you decide to go from an educational platform to then going into machine learning and biomedicine so I've been doing Coursera for about five years in 2016 and the company was on a great trajectory but it's primarily a content company and around me machine learning was transforming the world and I wanted to come back and be part of that and when I looked around I saw machine learning being applied to e-commerce and to natural language and to self-driving cars but there really wasn't a lot of impact being made on the life science area and I wanted to be part of making that happen partly because I felt like coming back to our earlier comment that in order to really have that impact you need to have someone who speaks both languages and while there's a new generation of researchers who are bilingual in biology and in machine learning there's still a small group and there are very few of those in kind of my age cohort and I thought that I would be able to have a real impact by building a company in this space so it sounds like your background is pretty varied what advice would you give to women who are just starting college now who may be interested in the similar field would you tell them they have to major in math or do you think that maybe like there are some other majors that may be influential as well I think there is a lot of ways to get into data science math is one of them but there's also statistics or physics and I would say that especially for the field that I'm currently in which is at the intersection of machine learning data science on the one hand and biology and health on the other one can get there from biology or medicine as well but what I think is important is not to shy away from the more mathematically oriented courses in whatever major you're in because that foundation is a really strong one there's a lot of people out there who are basically lightweight consumers of data science and they don't really understand how the methods that they're deploying how they work and that limits them in their ability to advance the field and come up with new methods that are better suited perhaps to the problems that they're tackling so I think it's totally fine and in fact there's a lot of value to coming into data science from fields other than math or computer science but I think taking courses in those fields even while you're majoring in whatever field you're interested in is going to make you a much better person who lives at that intersection and how do you think having a technology background has helped you in in founding your companies and has helped you become a successful CEO in companies that are very strongly R&D focused like like in Cetro and others having a technical co-founder is absolutely essential because it's fine to have an understanding of whatever the user needs and so on and come from the business side of it and a lot of companies have a business co-founder but not understanding what the technology can actually do is highly limiting because you end up hallucinating oh if we could only do this and yet that would be great but you can't and people end up oftentimes making ridiculous promises about what the technology will or will not do because they just don't understand where the landmines sit and and where you're going to hit real obstacles in the path so I think it's really important to have a strong technical foundation in these companies and that being said where do you see in Cetro in the future and and how do you see it solving say Nash that you talked about in your keynote so we hope that in Cetro will be a fully integrated drug discovery and development company that is based on a completely different foundation than a traditional pharma company where they grew up in the old approach of that is very much a bespoke scientific analysis of the biology of different diseases and then going after targets or ways of dealing with a disease that are driven by human intuition where I think we have the opportunity to go today is to build a very data driven approach that collects massive amounts of data and then let analysis of those data really reveal new hypotheses that might not be the ones that accord with people's preconceptions of what matters and what doesn't and so hopefully we'll be able to over time create enough data and apply machine learning to address key bottlenecks in the drug discovery and development process so we can bring better drugs to people and we can do it faster and hopefully at much lower cost that's great and you also mentioned in your keynote that you think the 2020s is like a digital biology era so tell us more about that so I think if you look if you take a historical perspective on science and think back you'll realize that there's periods in history where one discipline has made a tremendous amount of progress in a relatively short amount of time because of a new technology or a new way of looking at things in the 1870s that discipline was chemistry with the understanding of the periodic table and that you actually couldn't turn lead into gold in the 1900s that was physics with understanding the connection between matter and energy and between space and time in the 1950s that was computing where silicone chips were suddenly able to perform calculations that up until that point only people have been able to do and then in 1990s there was an interesting bifurcation one was the era of data which is related to computing but also involves elements statistics and optimization neuroscience and the other one was quantitative biology in which biology moved from a descriptive science of taxonomizing phenomena to really probing and measuring biology in a very detailed and high throughput way using techniques like micro rays that measure the activity of 20 000 genes at once or the human genome sequencing of the human genome and many others but these two fields kind of evolve in parallel and what I think is coming now 30 years later is the convergence of those two fields into one field that I like to think of as digital biology where we are able using the tools that have and continue to be developed measure biology in entirely new levels of detail of fidelity of scale we can use the techniques of machine learning and data science to interpret what we're seeing and then use some of the technologies that are also emerging to engineer biology to do things that it otherwise wouldn't do and that will have implications in biomaterials and energy and the environment in agriculture and I think also in human health and it's an incredibly exciting space to be in right now because just so much is happening and the opportunities to make a difference and make the world a better place are just so large that sounds awesome Daphne thank you for your insight and thank you for being on the queue thank you I'm Sonya Tagare thanks for watching stay tuned for more okay great from stanford university it's the cube covering stanford women in data science 2020 brought to you by silicon angle media hi and welcome to the cube I'm your host Sonya Tagare and we're live at stanford university covering the fifth annual wids women in data science conference joining us today is Lilian Carrasquillo who is the insights manager at Spotify Lilian welcome to the cube yeah thank you Sonya for having me so tell us a little bit about your role at Spotify yeah so i'm actually one of a few insights managers in the personalization team and within my little group we think about data and algorithms that help power the larger personalization experiences throughout Spotify so from your daily mix to discover weekly to your year-end rap stories to your experience on home and the search results that's awesome can you tell us a little bit more about the personalization team yeah so we actually have a variety of different product areas that come together to form the personalization mission which is mission is like the the term that we use for a big department at Spotify and we collaborate across different product areas to understand what are the foundational data sets and the foundational machine learning tools that are needed to be able to create features that a user can actually experience and yeah great um and so you're going to be on the career panel today um how do you feel about that i'm really excited yeah yeah the wids team has done a great job of bringing together like you know diverse is very uh it's an overused term sometimes there are a very diverse group of people with lots of different types of experiences which i think is core to how i think about data science it's a wide definition um and so i think it's great to show younger and mid-career women all of the different career paths that we can all take and what advice would you would you give to women who are coming out of college right now about data science yeah so my my big advice is to follow your interests so there's so many different types of data science problems you don't have to just go into a title that says data scientists or a team that says data scientists you can follow your interest and do your data science uh use your data science skills in ways that might require a lot of collaboration or mixed methods um or work within a team where uh there are different types of different different types of expertise coming together to work on problems and speaking of mixed methods um insights is a team that's a mixed methods research group so tell us more about that yeah so i personally manage um a data scientist and a user researcher and the three of us collaborate highly together across our disciplines we also collaborate across research science the research science team right into the product and engineering teams that are actually delivering the different products that users get to see um so it's highly collaborative and the idea is to understand the problem space deeply together be able to understand what is it that we're trying to even just form in our head is like the need that a user or end human end user human has um and bringing in research from research scientists and the product side to be able to understand those needs and then actually um have insights that another human you know a product owner can really think through and understand um the current space and like the product opportunities and to understand that user insight um do you use ab testing we use a lot of ab testing so that's core to how we think about um our users at Spotify um so we use a lot of ab testing we do a lot of offline experiments to understand the potential consequences or impact that certain interventions can have um but i think ab testing you know there's so much to learn about best practices there and where you're talking about a team that does foundational data and foundational features you also have to think about unintended or second order effects of algorithmic um ab test so it's been just like a huge area of learning in a huge area of just like very interesting outcomes and like every test that we run we learn a lot about not just the individual thing we're testing but just the process overall and um what are some features of Spotify that customers really love anything you can dance on anything that's like we know you so daily mix people absolutely love every time that i make a new friend and i tell them what they work on they're like i was just listening to my daily mix this morning um discover weekly for people who really want to stay you know open to new music is also very popular but i think the one that really takes it is any of the end of year wrapped campaigns that we have just the the nostalgia that people have even just for the last year but in 2019 we were actually able to do 10 years and that amount of nostalgia just like went through the roof like people were just like oh my goodness you captured the time that i broke up with that you know that person five years ago or or just like oh when i discovered that i love taylor swift even though i didn't think i liked her or something like that you know are there any surprises or interesting stories that you have about um interesting user experiences yeah i mean i can give i can give you an example from my experience so recently um a few a few months ago i was scrolling through my home feed and i noticed that one of the highly rated things for me was um women in country and i was like oh that's kind of weird i don't consider myself a country fan right and i was like having this moment where i went through this path of wait that's weird why would why would this recommend why would the home screen recommend women in country country music to me and then when i clicked through it it would show you you know a little bit of information about it it's because it had you know dolly parton it had margot price and it had the high women and those were all artists that i've been listening to a lot but i just had not formed an identity as a country music and then i clicked through it was like oh this is a great playlist and i listened to it and it got me to the point where i was realizing i really actually do like country music when the stories are centered around women that it was really fun to discover other artists that i wouldn't have otherwise jumped into as well based on the fact that i love the story writing and the songwriting of these other country acts that's so cool that you discovered that yeah so you have a degree in industrial mathematics yeah you went to a liberal arts college on purpose yeah because you wanted to try out different classes so how is that diversity of education really helped you in your career yeah yeah so my undergrad is from smith college which is a liberal arts school very strong liberal arts foundation and when i went to visit one of the math professors that i met told me that he you know he considers studying math not just to make you better at math but that it makes you a better thinker and you can take in much more information and sort of question assumptions and try to build a foundation for what the problem that you're trying to think through is and i just found that extremely interesting and i also you know i have an undeclared major in latin american studies and i i studied um like neuroscience and quantum physics for non-experts and film class and all of these other things that um i don't know if i would have had the same opportunity at a more technical school and i just found it really um challenging and satisfying to be able to push myself to think in different ways um i even took a poetry writing class i did not write good poetry but the experience really stuck with me because it was about pushing myself outside of my own boundaries and would you recommend having this kind of like diverse education to to young women now who are looking into i absolutely would i mean i think um you know there's uh some some people believe that instead of thinking about steam we should be talking instead of thinking about stem rather we should be talking about steam which adds the arts education in there and liberal arts is one of them and i think that now in these conversations that we have about biases in data and in lml and in ai um and understanding fairness and accountability accountability sorry it's a hard word apparently um i i think that a strong uh cross-disciplinary um collaborative and even on an individual level cross-disciplinary education is really the only way that we're going to be able to make those connections to understand what kind of second-order effects we're having based on the decisions of parameters for a model you know in a local sense we're optimizing and doing a great job but what are the global consequences of those decisions and i think that that kind of interdisciplinary approach to education as an individual and collaboration as a team is really the only way and speaking about bias um earlier we heard that um diversity um is great because it brings out new perspectives it also helps to reduce that unfair bias so how would spotify have you managed or has spotify managed to create a more diverse team yeah so i mean it starts with recruiting it starts with what kind of messaging we put out there and there's a great team that like thinks about that exclusively and they're really pushing all of us as managers as ices as leaders to really think about the decisions and the way that we talk about things and all of these micro decisions that we make and how that creates an inclusive environment because it's not just about diversity it's also about making people feel like this is where they should be on a personal level you know i talk a lot um with younger folks and people who are trying to just figure out what their place is in technology whether it be because they come from a different culture or or um you know they might be gender non-binary they might be women who feel like there isn't a place for them um and it's really about you know the things that i think about is because you're different your voice is needed even more right you know and like your voice matters and we need to figure out and i always ask how can i highlight your voice more you know how can i help i have a tiny tiny bit of power and influence you know more than than some other folks how can i help other people um acquire that as well lily and thank you so much for your insight thank you for being on the cube yeah thank you i'm your host sonia tagare thank you for watching and stay tuned for more from stanford university it's the cube covering stanford women in data science 2020 brought to you by silicon angle media hi and welcome to the cube i'm your host sonia tagare and we're live at stanford university covering the fifth annual with women in data science conference joining us today is lucy brunholz who is the senior senior research scholar at stanford university lucy welcome to the cube thanks thanks for having me so you've led the digital civil society lab at stanford for the past 11 years so tell us more about that sure so uh the digital civil society lab actually exists because we don't think digital civil society exists so let me back take that apart for you civil society is that weird third space outside of markets and outside of government so it's where we associate together it's where we as people get together and do things that help other people could be the nonprofit sector it might be political action it might be the eight of us just getting together and cleaning up a park or protesting something we don't like so that's civil society what's happened over the last 30 years really is that everything we use to do that work has become dependent on digital systems and those digital systems so i'm here i'm talking gadgets from our phones to the infrastructure over which data is exchanged that entire digital system is built by companies and surveilled by governments so where do we as people get to go digitally where we could have a private conversation to say hey let's go meet downtown and protest x and y or let's get together and create an alternative educational opportunity because we feel our kids are being overlooked whatever they all of that information that could exchange all of that associating that we might do in the digital world it's all being watched it's all being captured and that's a problem because both history and political science history and democracy theory don't show us that when there's no space for people to get together voluntarily take collective action and do that kind of thinking and planning and communicating it just between the people they want involved in that when that space no longer exists democracies fall so the lab exists to try to recreate that space and in order to do that we have to first of all recognize that it's being closed in secondly we have to make real technological process we need a whole set of different kind of did different digital devices and norms we need different kinds of organizations and we need different laws so that's what the the lab does and how does ethics play into that it's all about ethics and it's a word i try to avoid actually because especially in the tech industry i'll be completely blunt here it is it's an empty term it means nothing the companies are using it to avoid being regulated people are talking to talk about ethics but they don't want to talk about values but you can't do that ethics is a code of practice built on a set of articulated values and if you don't want to talk about values you're not really having conversation about ethics you're not having a conversation about the choices you're going to make in a difficult situation you're not having conversation over whether one life is worth five thousand lives or everybody's lives are equal or if if you should shift the playing field to account for the millennia of systemic and structural biases that have been built into our system there's no conversation about ethics if you're not talking about that thing and those those things as long as we're just talking about ethics we're not talking about anything and you were actually on the ethics panel just now so tell us a little bit about what you guys talked about and what were some highlights so i think one of the key things about the ethics panel here at wids this morning was that first of all it started the day which is a good sign if it shouldn't be a separate topic of discussion we need this conversation about values about what we're trying to build for who we're trying to protect how we're trying to recognize individual human agency that has to be built in throughout data science so it's a good start to have a panel about at the beginning of the conference but i'm hopeful that the rest of the conversation will not leave it behind we talked about the fact that just as civil society is now dependent on these digital systems that it doesn't control data scientists are building data sets and algorithmic forms of analyses that are both of those two things are just encoded sets of values and if you try to have a conversation about that at just the math level you're going to miss the social level you're going to miss the fact that that's humanity you're talking about so it needs to really be integrated throughout the process talking about the values of what you're manipulating and the values of the world that you're releasing these tools into and what are some key issues today regarding ethics and data science and what are some solutions so i mean this is the women in data science conference it happens because five years ago or whenever it was the organizers realized that women are really underrepresented in data science and maybe we should do something about that that's true across the board it's great to see hundreds of women here and around the world participating in a live stream right but as women we need to make sure that as you're thinking about again the data and the algorithm the data and the analysis that we're thinking about all of the people all of the different kinds of people all of the different kinds of languages all of the different abilities all of the different races languages ages you name it that are represented in that data set and understand those people in context in their date in your data set they may look like they're just two different points of data but in the world at large we know perfectly well that women of color face a different environment than white men right they don't work walk through the world in the same way and it's ridiculous to assume that your shopping algorithm isn't going to affect that difference that they experience in the real world that isn't going to affect that in some way it's it's it's fantasy to imagine that it's not going to work that way so we need different kinds of people involved in creating the algorithms different kinds of people in power in the companies who can say we shouldn't build that we shouldn't use it we need a different set of teaching mechanisms where people are actually trained to to consider from the beginning what's the intended positive what's the intended negative and what is some likely negatives and then decide how far they go down that path right and we actually had on Dr. Ruhman Choudhury from Accenture yeah and she's really big in data ethics and she brought up the idea that just because we can doesn't mean that we should so can you elaborate more on that yeah well it just because we can analyze massive data sets and possibly make some kind of mathematical model that based on a set of value statements might say this person's more likely to get this disease or this person's more likely to excel in school in this dynamic or this person's more likely to commit a crime those are human experiences and while analyzing large data sets that in the best scenario might actually take into account the societal creation that those actual people are living in trying to extract that kind of analysis from that social setting first of all is is absurd second of all it's going to accelerate the existing systemic problems so you've got to use that kind of calculation over just because we could maybe do some things faster or with larger numbers are the externalities that are going to be caused by doing it that way the actual harm to living human beings are should those just be ignored just so you can meet your shipping deadline because if we expanded our time horizon a little bit if you expand your time horizon and look at some of the big companies out there now they're now facing those externalities and they're doing everything they possibly can to pretend that they didn't create them and that loop needs to be shortened so that you can actually sit down at the at you know some way through the process before you release some of these things and say you know in the short term it might look like we'd make x profit but spread out that time horizon you know i don't know two x and you face an election in the world's largest longest lasting stable democracy that people are losing faith in set up the right price to pay for a single company to meet its quarterly profit goals i don't think so so we need to reconnect those externalities back to the processes and the organizations that are causing those larger problems because essentially having externalities just means that your data is biased data or biased data about people are biased because people collect the data there's this idea that there's some magic de-bias data set is is science fiction it doesn't exist it certainly doesn't exist for more than two purposes right if we could and i don't think we can de-bias a data set to then create an algorithm to do a that same data set is not going to be de-biased for creating algorithm b humans are biased let's get past this idea that we can strip that bias out of human created tools what we're doing is we're embedding them in systems that accelerate them and expand them they make them worse right they make them worse so i'd spend a whole lot of time figuring out how to improve the systems and structures that we've already encoded with those biases and using that then to try to inform the data science we're going about in my opinion we're going about this backwards we're building the biases into the data science and then exporting those tools into biased systems and guess what problems are getting worse that so let's stop doing that thank you so much for your insight lucy thank you for being on the cube oh thanks for having me i'm sonia tigari thanks for watching the cube stay tuned for more live from stanford university it's the cube covering stanford women in data science 2020 brought to you by silicon angle media hi and welcome to the cube i'm your host sonia tigari and we're live at stanford university for the fifth annual wids women in data science conference joining us today is nun ho the director of data science at into it nun welcome to the cube thank you for having me here sonia so tell us a little bit about your role at into it yeah so i leave the applied machine learning teams for our quickbooks product lines and also for our customer success organization within my team we do applied machine learning so what we specialize in building machine learning products and delivering them into our products for our users great and today today you're giving a talk you talk about how organizations want to achieve greater flexibility speed and cost efficiencies and you're giving a technical vision talk today about data science in a cloud world so what should data scientists know about data science in a cloud world well i'll just give you a little bit of a preview into my talk later because i don't want to spoil anything yeah but i think one of the most important things of being a data scientist in a cloud world is that you have to fundamentally change the way you work a lot of us start on our laptops or a server and do our work there but when you move to the cloud it's like all bets are off all the limiters are off and so how do you fully take advantage of that how do you change your workflow what are some of the things that are available to you that you may not know about and in addition to that some some things that you have to rewire in your brain to operate in this new environment and so i'm going to share some experiences that i learned firsthand and also from my team in into its cloud migration over the past six years that's great i'm excited to hear that um and so you work it into it into it has sponsored wits for many years now um last year we spoke with could be the sun one from into it so tell us about this into its sponsorship yeah so uh into it um we are a champion of gender diversity and also all sorts of diversity and when we first learned about wits we said we need to be um a champion of the women in data science conference because for me personally oftentimes when i'm in a room um going over technical details i'm often the only woman and not just that i'm often the only woman executive and so part of the sponsorship is to create this community of women very technical women in this field to share our work together to build this community and also to show the great diversity of work that's going on across the field of data science and so into it has always been really great for for embracing diversity um tell us a little bit about about that experience about being part of into it and also about the tech women um part yeah so one of the things that into it that i really appreciate is we have employee groups around specific interests and one of those employee groups is tech women into it and tech women into it the goal is to create a community of women um who can provide coaching mentorship uh technical development leadership development and um i think one of the unique things about it is that it's not just focused on the technical development side but on helping women develop into leadership positions um for me when i first started out there were very few women in executive positions in our field and data science is a brand new field and so it takes time to get there now that i'm on the other side one of the things that i want to do is be able to give back and coach the next generation and so the tech women and into a group allows me to do that through a very strong mentorship program that matches me and early career mentees across multiple different fields so that i can provide that coaching and that leadership development and and speaking about like diversity in the opening address we heard that diversity creates perspectives and it also takes away bias so why is gender diversity so important into it and how does it help take away that bias yeah so one of the important things um that i think a lot of people don't realize is when you go and you build your products you bring in a lot of biases in how you build the product and ultimately the people who use your products are the general population for us we serve consumers small businesses and self-employed and if you take a look at the diversity of our customers it mirrors the general population and so when you think about building products you need to bring in those diverse perspectives so you can build the best products possible because the people who are using those products come from a diverse background as well right um and so now um add into it like instead of going from a desktop based application we're at a cloud based application which is a big part of your talk um well how do you use data to um for ab testing and and why is it important yeah oh ab testing that is a personal passion of mine actually because as a scientist what we like to do is run a lot of experiments to say okay what is the best thing out there um so that ultimately when you ship a new product or a feature you send the best thing possible that's verified by data and you know exactly how users are going to react to it when we were on desktop it made it incredibly difficult because those were back in the days and i don't know if you remember this but back in the days when you had a floppy disk right or uh even a cd rom that's how we ship our products and so all the changes that you wanted to make had to be contained in there and you really only ship it once per year so if there's any type of testing that we did we would bring our users in have them use our products a little bit and then say okay we know exactly what we need to do ship that out so you only get one chance now that we're in the cloud what that allows us to do is to test continuously via ab testing every new feature that comes out we have a champion challenger model and we can say okay the new version that we're shipping out is this much better than the previous one we know it performs in this way and then we get to make the decision is this the best thing to do for our customer and so you turn what was once a one-time process a one-time change management process to one that's distributed throughout the entire year and at any one time we're running hundreds of tests to make sure that we're shipping exactly the best things for our customers that's awesome um so um what advice could you give to the next generation of women who are interested in stem but maybe feel like oh i might be the only woman i don't know if i should do this yeah i think the the biggest thing for me was finding mentorship and initially when i was very early career and even when i was doing my graduate studies for me a mentor was someone who was in my field but um when i first joined into it an executive in another group who was a female said hey i'd like to take you aside provide you some feedback and this is some coaching i want to give you and that was when i realized hey you don't actually need to have that person be in your field to actually guide you through to the next step and so for women who are going through their journey and are early on i recommend finding a mentor who is at a stage where you want to go regardless of which field they're in because everybody has diverse perspectives and things that they can teach you as you go along and how do you think wids is helping um like women feel like they can do data science and be a part of the community yeah i think what you'll see in the program today is a huge diversity uh of our speakers our panelists through all different stages of their career and all different fields and so what we get to see is not only the time baseline of women who are in their phd's all the way to very very well established women the provost of stanford university was here today which is amazing to see someone at the very top of their career who's been around the block um but the other thing is also the diversity in fields when you think about data science a lot of us think about just the tech industry but you see it in health care you see it in academia and there's a scene that wide diversity of where data science and where women who are practicing data science come from i think is really empowering because you can see yourself in there and representation does matter quite a bit absolutely and um where do you see data science going forward oh that is a uh tough and interesting question actually and i think that uh in the current environment today we could talk about where it could go wrong or where it could actually open the doors and for me i'm an eternal optimist and one of the things that i think is really really exciting for the future is we're getting to a stage where we're building models not just for the general population we have enough data and we have enough compute where we can build a model tailored just for you for all of your likes and for me i think that that is really really powerful because we can build exactly the right solution to help our customers and our users succeed um specifically me working in the personal and small business finance phase that means i can help that cupcake shop owner actually manage her cash flow and to help her succeed right to me that i think that's really powerful and that's where data science is headed nine thank you so much for being on the cube and thank you for your insight thank you so much so yeah i'm sonia tigari thanks for watching the cube stay tuned for more from stanford university it's the cube covering stanford women in data science 2020 brought to you by silicon angle media hi and welcome to the cube i'm your host sonia tigari and we're live at stanford university covering the fifth annual wids women in data science conference joining us today is yashu the head of data science at linkedin yeah welcome to the cube thank you for having me so tell us a little bit about your role and about linkedin so linkedin is uh first of all uh the the biggest professional social network uh where we have a massive economic graph that we have been creating with millions actually close to 700 million members and uh millions of companies and jobs and and of course you know with students of skills and and also that schools as well as part of it and uh and i lead the data science team at linkedin and and my team really spans across the global presence that linkedin's offices have um and uh yet really working on various different areas that both thinking about how we can iterate and understand and improve our products that we deliver to our members and our customers and also at the same time thinking about how we can make our infrastructure more efficient and thinking about how we can make ourselves a marketing uh more efficient as well so really span across and um how is the use of data science um evolved to deliver a better user experience for users of linkedin yeah so first of all i think we uh linkedin is a uh in general we we truly believe that everybody can benefit from um better data uh better data access in general so uh we we certainly uh we's uh using data to continuously understand better of what our members are looking for uh as a simple example is that we whenever we uh launch a new feature uh we are not just blindly decide ourselves that is the better feature for our members but we actually understand how our users react into it right so we use data to understand that and then certainly making decisions uh and and whether we should be eventually launching this feature to all members or not so that's a very prominent way for us to use data and obviously we also use data to understand and just uh even before we build in certain features is this sort of feature that's right uh feature to build we do both uh survey uh and understand the survey data but also at the same time understanding just user behavior data for us to be able to uh come up with better features for for users and do you use ab testing as well oh absolutely yeah so we uh we we do a lot of ab experiments that's what i i was not trying to use that word but like uh that terminology but this is what we we use to uh have an understanding of is the features that we are developing that we are putting in front of our users is that what they enjoy as much as we think they would enjoy right um so you had a talk today about uh creating global economic opportunities with responsible data so give us some highlights from your talk so uh so first of all uh um editing we uh we truly believe in the vision that we are working towards which is really creating economic opportunity for every member of the global workforce and if you're kind of starting from that and thinking about that is our uh sort of the the the axiom that we're working towards and then thinking about how you can do that and then obviously the the sort of the table stake or just the the um the the fundamental thing that we have to start with is to be able to preserve the privacy of our members as we are leveraging the data that our members entrust with us right so how can we do that we have some early effort in using and developing differential privacy uh as a technique for us to do a lot better with regarding preserving that privacy as we're leveraging the data um but also at the same time it doesn't end there right because you're thinking about uh creating opportunity it's not just about it's preserve the privacy but also when we are leveraging the data how can we leverage the data in a way that is able to create opportunity in a fair way uh so so there is also a lot of effort that we're having uh with regarding how can we do that and what does fairness mean uh what are the ways we can actually turn some of the key concepts that we have into action that is really able to drive the way we develop a product the way that we're thinking about responsible design and the way that we build our algorithms the way that we measure in every single dimension and speaking about that bias um uh at the opening address um they mentioned that diversity is really great because it provides many perspectives um it also helps reduce this bias so how have you at linkedin been able to create a more diverse um team uh so first of all uh I think it's certainly there uh there is a um uh we all believe that diversity is certainly better as we're building product thinking about if you have uh a diverse team that is really a a representation of the customers and members that you're serving then then you're definitely a better to be able to come you come you are able to come up with better features that is able to serve the needs of the the population uh of our member um but also at the same time um that's just the right thing to do as well right thinking about um we we all uh have had uh experiences we we may not you know fail as much below when we walk into a room that we are the only person of uh that we identify ways to be in that room and and we we certainly wanted to be able to create that environment um for all the employees uh as well and and thinking about I think there is also uh studies that has done as what makes a high performing team uh some of the studies that's done at google with uh the the psychological safety uh aspects of it which is really there is a lot of brain science that says when you make people feel they belong that they will actually be so much more creative and innovative and everything right so we have that belief um and but tactically there are many things that we're doing uh from uh all the dibs aspect right how can you bring diversity inclusion and belonging um and uh starting from uh hiring right so we we certainly are very much emphasized uh on how can we increase the diversity of individuals that we're bringing to lending and when they are uh lending can we make them feel more belong and then feel more included in in every aspects we have different inclusion groups uh right we have I mean obviously I'm very much involved in women in tech uh lending uh we have uh both uh many efforts that we we do to help women lending in engineering and in other groups as well to feel that they belong uh to this community at the same time there is uh concrete actions that we're taking too right that we are hoping women to uh have a much better understanding and aware of some of the ways that we operate that is slightly different from maybe our male colleagues will operate right there are certain things that we're doing to change the current processes hiring processes promotion process that we are able to bring more equal footing to the way that we're thinking about gender gap and gender diversity right that's great and what advice would you give to women who are just starting college or who are um just out of college who are interested in going into data science so uh I I want to say the the biggest learning for me is just have that kind of attitude uh you know the uh the um woman biologically and all just like in every way we are we're not any uh less than man and then you certainly have seen many uh strong and very talented women uh that we have in the field so don't let people's perception or biases around you to bring you down and then thinking about what you wanted and then just go for it and then go for the the advice that you can get from people and then there are so many and we can see in the conference today so many talented women that you can reach out to who are willing and very willing to help you as well and in this age of AI and ML um where do you see data science going in the future that's a really uh interesting question so uh in in the way that you know data science I want to say it's a field that is really broad uh right so if you're thinking about uh things that I would consider to be part of data science may not necessarily be part of AI but some of the the the causal inference uh that is extremely popular and important and then there and there um uh the the I think the the fields will continue to evolve um there are going to be uh and then the fields are continually overlapping with each other as well you cannot do data science without understanding or have a strong have a strong skill in AI and in machine learning and you also cannot have you can't do great machine learning without understanding the data science either right so thinking about some of the the talk that definitely colder earlier or was sharing as in uh you know you can you can blindly run your algorithm and without realizing the bias uh that all the the algorithm is really just detecting uh the machines that's used in the in the images versus you know actually detecting the difference between broken bones or not right like so so I think having I do see there is a continuously big overlap and I think the the individuals who are involved in both communities should continue to be very comfortable being in that way too right right yeah thank you so much for being on the cube and thank you for your insight of course thank you for having me uh I'm your host Sonia thank you for watching the cube and stay tuned for more okay we are going to start in 30 seconds so please come in find a seat settle in for the afternoon okay 10 seconds warning okay well welcome back everybody we're going to be starting here so yeah you're ready I know Sarah's ready so we're ready how were the breakouts over lunch are they good breakouts over lunch useful yeah how how did the Berkeley group do over lunch yeah fantastic okay if I can get your attention then then we'll start I would like to announce our first speaker for the afternoon we we have three more hours to go or just a little bit under there will be many wonderful talks there will be a career panel as we normally have we will talk about the artists that we have with us today you haven't seen them yet they will come at during the reception time and they're phenomenal so we'll talk to you about this a little bit later and right now I'm just really excited to do a shout-out to with Calgary they're here with us still as well so that's fantastic and I'm very excited to introduce to you our next speaker we have a very strong connection to Berkeley uh there's folks here in the audience they came from Berkeley even even though they have their own wits going on there today but they with their words they said yours is better that that you don't hear this too often from across the bay but we had purses this morning the profiles from Stanford and now we're gonna have and I'm so happy she could make it today Xu Xie Li Liu from Berkeley she is the Dean of the School of Engineering in Berkeley by the way Berkeley just had an amazing announcement yesterday for a very big donation for data science initiative and that was congratulations that's splendid and I'm so glad you could make it with us today what I like so much about you is that you're not just an amazing academic leader but she is also an incredible industry person she co-founded a startup she's a dean and she co-founded the startup before then as well so a leader in academia a leader in the industry she's very high on eq and for those of you who know what it is you know the e stands for engineering right all right thank you Margot and good afternoon everyone how are you doing I noticed that there's no title for my talk in the program but yes I think Margot was trying to lead up to my the title of my talk which is why a world with AI needs more eq and e means emotional quotient or basically emotional intelligence so first off I'd like to just say how much I appreciate being invited here to speak at this conference hello to everyone here and also everybody who's watching live online I want to thank Margot and Karen and Judy the organizers of this conference for inviting me to give a talk so it's always nice to come back to Stanford I don't think she mentioned that I earned every single one of my degrees in electrical engineering from Stanford University so I actually feel most comfortable here wearing red so for those of you who aren't familiar with the rivalry between UC Berkeley and Stanford we have a friendly sporting sports team rivalry and but really it doesn't translate to competition so much in academics in fact earlier today I was meeting with the Dean of Engineering here at Stanford Jennifer Whittom and we had a great chat about initiatives possibilities for collaboration between our colleges all right so today I just wanted to share with you since I am actually not a data scientist I wanted to first introduce a little bit about myself how basically I'm related to maybe this AI revolution and to talk about some of the reasons why I chose to step up and serve as Dean of Engineering because I think there's some serious challenges ahead in a world where AI can actually you know take over or exceed human intelligence and why and what we're doing at Berkeley to ensure that the outcomes are you know going to be human compatible and benefiting the good of society okay so it's telling you sort of my connection to AI this chart is a plot showing exponential growth over time over a period of 120 years how the the technology for computing has advanced okay so that the vertical axis is the calculations per second per constant dollar constant thousand dollars I guess appropriately you know I guess taking into account inflation so we can see that this trend of exponential improvement in computing performance is projected we most people do expect it to continue to to continue for at least another decade and we can see that the you know the level of computational speed of a human brain eventually will be reached and that's uh estimated to be in about the middle of this century now I actually was a student here in the 1980s through the early 1990s and that's kind of really in the in the the regime where integrated circuits you know integrated circuit technology silicon valley sort of coming onto the scene was really hot and that's why I ended up measuring in electrical engineering now I moved over to Berkeley after I graduated a few years after I graduated and you know you might already know today that the computing devices today these computer chips that are in all of the electronic devices they're the brains of your computers your laptops you know in the cloud servers but also in your mobile devices those devices comprise are highly complex a single chip of silicon can contain over a billion transistors up to 10 billion transistors and so a whole system ecosystem of how to design these new chips every every year you have a new product that can do more and and basically the industry has sort of segmented itself into different layers of abstraction so I've represented this and in terms of a stack of information technology stack so I just wanted to point out you know that in academia we've actually contributed a lot to the AI to enabling the AI revolution today you know when I was a student here in the 80s I took a course on artificial intelligence but a computing technology was not yet advanced enough to really realize the power right the full potential of AI so over the last 20 plus years computing technology has advanced so now AI can be you know real time they're all kinds of exciting applications of data science so first of all at Berkeley these are just examples from Berkeley spice is a stands for simulation program with integrated circuit emphasis so basically how do you design billion component systems make sure that they work properly and you know in on time software is used to automate that and that's spice the reduced instruction set architecture for for microprocessor was developed at Berkeley and basically this technology or this computing architecture is used in mobile devices today it's a lower power the operating systems Unix was open source at Berkeley the Berkeley software distribution operating system formed the basis for operating systems used today let's say in the Apple Mac operating system and also in Microsoft Windows now where do I come in I'm at the bottom there that's like the lower back okay it's not the tailbone but you know at the bottom we we actually also have to have innovations in materials and the transistor designs and little miniature electronic switches to operate at higher and higher speeds and also to be able to be miniaturized to atomic dimensions and so that's how I contribute to this stack and so the FinFET is a new transistor design that's used in all leading edge microprocessors today and most recently out of Berkeley you probably are much more familiar with this than I am in this spark basically cluster computing framework to really speed up data analytics okay so basically academia has really contributed to innovations that are enabling AI today and that's my connection and we all envision that in the future a lot of things will be automated we'll have you know interconnected devices as well as people will have not only smart transportation that we're starting to see come on the scene but in manufacturing smart factories smart and personalized medicine and and health care and and so on right so all the infrastructure can can benefit from AI so from you know water distribution energy distribution transportation networks and so on so this is division of the future and there are going to be significant impacts on society not only benefiting us you know making our lives hopefully more pleasant but it's really going to change the nature of the workplace of the nature of jobs okay so this is something that people are we're starting to talk about for several years now okay and so I'm just citing some work that was some reports from a few years ago so if you look at this chart it just lists the top trends that are going to impact business models so we already see today new types of businesses enabled by e-commerce and so on and and big data so there's no argument that mobile the mobile internet cloud you know big data technology is going to change the nature of jobs because a lot of work is going to be automated now what's interesting is that if you look at the level of risk there are various categories you can put job every job into each job can be put into a category of how risky how at risk is it of being eliminated due to automation and so on this chart women the percentage of jobs held by women is shown a dark blue and percentage of jobs held by men is in light blue and the different sets of bars are going from the left low risk of being replaced are automated and to the right is very high risk so you can see that the jobs of the highest risk to be eliminated due to automation are dominated by women also if you look at the low risk jobs the risks the jobs that are at lowest risk of being replaced the what women earn in those jobs is much less than what men earn these bars compare men and women average wages median and and mean okay and of course the tech industry will continue to grow AI you know enables a lot of transformation and for all industries so there are a lot be a lot of jobs but generally the the sectors of the job market that are going to be growing are dominated by men so even though men will also lose jobs but they won't lose as many for every single 20 lost jobs for women only one is projected to be gained one you stem job is projected to be gained and that's a stark comparison to only only four lost jobs for every stem job gained for men all right so I think everybody's aware everybody in the audience today probably is aware that you know men dominate in terms of the workplace the percentage of workers for the left two bars show computing men the taller bar and then computing women and then also this situation is probably even more disparate for engineering and the different colors represent different race race and ethnic ethnicities all right so first of all women they're not as many women working in high tech today in either computing or engineering so let's say roughly 25 percent of jobs held technical jobs held today are held by women but what's more disturbing is at the bottom of this slide showing that for science engineering and technology jobs more than half are almost half or more than half of the women who start actually within 10 or 10 years or so move out of those jobs and they usually move into some a lot of them move by half of them move into some different type of job and like out out of technology altogether okay or maybe starting their own company all right so this basically these are the problems right so low representation high attrition and I've talked to a lot of women who grad of alumni and asked them well why did you end up leaving the tech track why did you end up going into whatever HR or sales marketing and so on and very often the answer is that they didn't see a career path for moving upward like moving to higher levels of management or you know to executive management or you know being CEO and so on so this chart is published by McKinsey and company they do a nice report every year on women in technology so it shows that sort of the percentage of women in light blue at the bottom that occupy jobs at the entry level and going to the right that's higher and higher levels of management so what you see here is attrition in levels of women so as you go higher and higher up in the layers of management the percentage of women is dropping fast and it's even worse for women of color so this is the issue of intersectionality so those are issues okay so first of all like is that a problem I hope that people know that it is but just a couple of examples the second one I'll show here probably is most relatable to some of the researchers here but a common example I talk about with young girls is the the the example of airbag you know systems in used in cars to keep people safe in case of an accident so this is a dummy that's used for testing airbag deployment systems in automobiles and the very first actually for the longest time the dummies were sized and weighted based on a male anatomy so even though airbag systems were actually mandatory starting in the 90s it wasn't until the year 2010 that U.S. Department of Transportation required that car manufacturers use dummies that were weighted and sized more like women as part of their you know in their testing to develop their airbag deployment systems and and basically car manufacturers have found that generally yes the women are about more than 50% more likely to get injured in a car accident where the airbag is deploying because first of all we're generally shorter and the airbag might hit our head or neck and also our neck you know our anatomy we are not we don't have this muscular like you know neck and strong spinal sort of support so this is an example and and people researchers who've studied this do admit well it's probably because the men that these systems were designed by engineering teams that weren't diverse enough and didn't think about you know that maybe half the passengers and cars are are female right not all sized weighted like men okay so this might be closer to home so voice recognition systems so it turns out that voice recognition systems I didn't recognize this actually our is yeah voice in a voice AI is is going to become a projected to become like an 80 billion dollar business within the next couple of years and google reports that almost 50% of all queries today are by by voice okay and they claim that they have a 95% accuracy rate so the question is accuracy for what kinds of people so the very first voice recognition systems only recognize male voices and interestingly this is also the case for automobile manufacturers you know today we we have voice recognition systems in the cars right we talked to the cars and automakers have admitted for years that their speech recognition system doesn't work well for women and the recommended remedy do you know what the recommended remedies yeah speak more like a man that's obviously this this is what vps have said right women should be taught to speak louder direct their voices towards the microphone same yeah same for people who who have yes are not native english speakers and so on so this shows these are just examples of lack of empathy of the the managers the the engineers who design these systems that are meant of course not only for men right if you want to make money you have to have a product that works for both men and women all right so i think it's pretty clear it should be obvious diverse teams organizations really because they comprise a wider range of viewpoints and skills that leads to greater collective intelligence so these issues lives literally were lost you know with the airbag deployment systems this is might be just the inconvenience but imagine that today speak recognition is used like for interviewing people like for immigration job hiring and so on and imagine the kind of bias if there's a bias against women and minorities what kinds of decisions could you know affect people's lives if these voice recognition systems are not you know designed to work for all people and we should just recognize that there are more dimensions of diversity than gender and race and ethnicity you know inner dimensions as well as outer dimensions this is a really nice quote from dr france kordova she's the director of the national science foundation just stating that what we hope is obvious diversity of thought perspective and experience is essential for excellence in research and innovation in science and engineering and for the people the executives who need to be further convinced if we look at companies on the left who are in the up the first quartile in terms of gender diversity first quartile in their in among all businesses they have a higher likelihood of financial performance which is above the industry median okay so compared to if your company is in the lowest quartile in terms of gender diversity and the difference is even greater if you have in terms of ethnic diversity so diversity makes sense business wise and and so companies should be motivated to foster diversity in their workforce so that's the the the situation a challenge and that's what one reason why I decided to come become dean because I wanted to do some things to try to to counteract this and to ensure that as the pace of technology advancement accelerates people are not left behind the digital divide does not grow now the question is okay what is the root cause of this of this disparity the gender gap well I think there are a lot of reasons one could be just outright discrimination but there are also subtle reasons so unconscious bias is one so this is attitudes or stereotypes that really affect our actions bottom line right and decisions in an unconscious manner so how many of you have are aware of the Harvard implicit association test this is great so it's a free test online I usually have some of my students when I teach take this test just to increase awareness of you know hidden biases so this chart here just shows of the people who've taken this implicit association test online for free to see if they have implicitly some association between gender and career the majority of them do have some automatic subconscious association of males with careers and females with family so we really should check our bias in order to address it right to try to counteract it so this is a nice example of a study that was done I guess the New York Times what they did was they sent email to a couple of thousand two hundred two thousand five hundred professors at hundreds of universities just an inquiry can you can I have a meeting I'm interested to be a PhD student and they changed the name and from the name you can change you can imply gender and race and so on and it was clear that white males if your name sounded like a white you were a white male you were far more likely to receive a response just a couple of other really quick examples a lot of us choose people in our research groups or to hire into our companies based on first first past would be looking at your CV right your resume and so this study was done with 200 over 250 professors in physics and biology at eight large public universities they each evaluated eight CVs for postdoc positions and the only thing that was different in their what they did was they changed it the people who conducted the study changed the names only the names nothing else to sort of imply somebody's female or african-american or you know and so on so the results showed that for the exact same CVs in general the males were rated to have a higher level of competence that's a dark blue bar on the left compared to women which is the gray in terms of higher ability again women men even though it's the exact same CV were deemed to be more hireable it's kind of nice that women seem to be more likable but that doesn't help you get a job right and then looking at race race and ethnicity again not surprisingly latinx and blacks did not fare as well even though the CVs were identical right so there's obviously some bias implicit bias how about letters of recommendation sometimes we ask for a recommendation we don't base our decisions for hiring based only on CVs letters of recommendation a separate study analyzed 300 levels letters of recommendation from medical faculty at a large u.s. medical school they found and this is great data's you know text analysts can do this very quickly male candidates generally are more often described as researchers and professionals successful and having innate ability and more often female candidates were described as teachers and students very nurturing working hard to get to where they are um and then looking for key adjectives like when we hire faculty when we want to know that they will have the potential become a star you know what is their their claim to fame their home run it turns out these standout adjectives the data shows from 886 letters that are much more often used for male versus female candidates so there's um you know bias so you should keep that in mind when you're looking at CVs and letters of recommendation or when you're writing letters of recommendation so the question is how do we solve this problem well first of all we need to increase our own awareness okay because of time i skipped this so what now i'm going to talk about what we're doing at Berkeley so my associate dean for students has developed this new course i know i don't expect you to read this but basically it's engineering one we offer to all students but freshmen especially engineering your life and uh some of the modules in this course are uh tools for personal leadership so basically reflecting on your own personal life story you know bringing basically understanding yourself and the next module is tools for self-discovery and knowledge mastery and then tools for diversity and teamwork tools for social societal service and then personal leadership plans so this is something um it's been a really successful course the students really appreciate it um so that's one thing sort of increasing awareness um so we don't have bias so we can hire more diverse people into our organizations but diversity is not only what's important if we don't include those people if those people don't feel like they belong they can't contribute to the full potential and our teams therefore can't reach our full potential and we have to also recognize that people come from different backgrounds and experiences so that equity is not the same as equality right people come from different have different abilities and so on if we want people to participate equally we have to take that into account to achieve to truly achieve equity um and inclusion so at Berkeley another thing we've done and there's a website here um we have started a series of uh workshops to empower our engineering students staff and faculty to be agents of change positive change um and so what we have is engaging you know interactive workshops to have people practice learn about and practice skills for um well so first of all increasing awareness of our personal biases but how to interrupt exclusionary behaviors and how to advance equity inclusion okay so so far we've taught talked about creating inclusive classrooms how to have fair faculty searches and how do we grade for equity and then finally oh so those of you who are not at Berkeley you can always also benefit from another resource we have an engineering library and a collection of books and other published articles that talk about diversity equity inclusion and we have an online version so all of these resources are available online i encourage you to check that out this is a good book invisible women and that's like relevant to data this shows how big data um basically if it doesn't if you don't tag it men or women it does automatically assume it's you know male associated with male finally as in terms of agents of change we can incentivize companies to actually pay attention to data look at the the diversity of the workforce and so i'd like to encourage you all to to visit this website we have a corporate diversity inclusion survey that um how we have various companies who want to recruit our students come and fill out a survey to see if they track diversity metrics what they do to foster equity inclusion this is a resource for students who are trying to decide which companies to work for so i'd encourage you to look at that as a resource as well so i'd like to close by thanking um the people with whom i've worked for the last more than five years to come up to speed on this issue and define solutions from um not only from the college of engineering but from the women in technology initiative at university of california basically this initiative is really trying to increase the persistence and success of women in tech fields and want to invite you all if you have are not aware of it we have a symposium this friday at berkeley um about cyber security featuring leading women in the cyber security field and a lot of interesting panel discussions and we'll give some awards to recognize contributions of women to cyber security so in closing i know margo is going to give me the hook um so women in technology initiatives so as a professor you know i have a chance to give a lot of talks like this so i'd just like to share with you you know we have such amazing students and you know here at sanford and at berkeley and they are inspiration for the faculty so one of the students that i talked um gave a talk to last semester sent me an email last semester saying i was attending a lecture with a neuroscientist who claimed that the male brain is better at working with deep technology than the female brain and that was one of the reasons that men currently dominate engineering and so she asked me well what did i think of that and so on so what i ended up doing was sending her a very long email to explain you know about the how the brain is plastic and how it's shaped you know our abilities are shaped by our experiences and so on and so what she she responded to me saying you know i've printed out your your message and i keep it like next to my desk and i look at it whenever i you know feel like i don't belong and i thought it was just really cool because what she did was she ended up adding additional reasons why she should persist in engineering and what i like the most is that it's like because you are not alone and so i think everybody here this is proof positive this conference it's proof that we are we women girls are not alone and i'd like to thank you for all you're doing um for to advance the field of technology you know data science and so on toward a better future for all of us and thank you for your kind attention this afternoon thank you so much that was wonderful yeah thank you really really appreciate that there's such fantastic work happening in in berkeley in this field and i know that's also because of this amazing dean of engineering so thank you we are changing topic a little bit and i'm i'm really happy to introduce the next speaker nunho from into it and she has a phd in astrophysics so we've seen people from all sort of different areas electrical engineering mathematics physics humanities and now astrophysics and what i love is that she called herself on her cv a supernova hunter in the past so welcome to the supernova hunter uh nunho all righty so good afternoon everyone how y'all feeling good all right hopefully everybody's feeling refreshed you've had food you've had coffee and between me and 30 more minutes is more coffee so i am delighted and honored to be here today amongst such an established and diverse group of data science practitioners i can tell you that i have been in many audiences where the diversity and abilities of the group is not as varied as here today and it's especially of note that this is the beginning of international women's week and so give yourselves a round of applause for just getting started so i'm here today to talk to you about what does it mean to do data science in the cloud right over the past six years that i've been at into it i've gone through and seen us through our entire end-to-end cloud migration journey and there are some things that i wish that i knew at the beginning during and even at the end that i hope to share with you today and as we mentioned i was a national physicist in a previous life so what the heck am i doing yet into it so i can tell you that this is not working giving one second there it is all right so at the start of my phd for the first time in my life i had no money at the end of the month and i didn't know why right for the first time i had a salary which was awesome i'm making money but i had bills to pay i had a car payment i had all of these things and i didn't know where my money was going and if you've been through the us educational system no one teaches you how to create a budget at all and so i was at a loss right um but the only thing i knew how to do was open up google and i typed in budget tracker and mint dot com came up i can tell you it was love at first sight when i first signed up to mint it automatically connected to my bank account it pulled in my transactions it made predictions and put those transactions into approximately the right category note the approximate part because i'm going to get back to it and i actually have to see my budget and i'll tell you spoiler alert a lot of it went to restaurants i love to eat out and it was really biting me in the bum back then so second month i log back in take a look at my budget again this is the beginning of the month i'm trying to be proactive and what do i see some of the things that i spent time correcting before were put back in the wrong categories yet again and this began to annoy me a little bit third month i log in same thing and it felt like everything i was doing was going into a black hole if you can imagine being a natural physicist you go to work you see a black hole you go home you see a black hole it was not awesome and so i really fell out of love with mint right i abandoned the product i said this is too much work it wasn't getting smarter i'm doing all this work already so i wrote a script to pull the csv from my bank and made my own budget a couple years down the line i'm on the job market now i'm transitioning out of astrophysics into data science and a good friend of mine said hey you should consider into it i think you'd really like the work that they do there and so i racked my brain i was thinking into it into it how do i know into it so again i went back to google pulled it off and said oh they made mint which is not super awesome right i had really bad memories and so i said okay well i'll go for the end interview why the heck not and when i went there some of the people i might really change my mind for a company that's been around for 30 years into it was at the beginning of a transformation in how to leverage all the data that our customers give us to build better products and the future of the company was moving towards this revolution that ai is going to help us drive towards and i asked them a simple question i said if i came to into it can i fix mint i said yes i said cool sign me up and so i've been into it for the past six years now and six years is a really long time in the valley right so i know our products inside and out but for those of you who are not as familiar with into it as i am where the makers of turbotux in addition to my favorite mint as well as quickbooks and we serve our small businesses consumers and self-employed with a mission to power prosperity around the world via an ai-driven expert platform if you can imagine that's an extremely lofty goal and for me it's a personal mission to be able to leverage my skill set to help our customers and this message actually resonates with me quite a bit because add into it my job is to lead the applied machine learning team to build better products for our small business customers and why do small businesses matter to me so much other thing you need to know about me is i come from a family of 11 children i am the youngest in that family that's me and orange last time in my life i'll ever be able to wear orange so if he's sure i found it but what's unique about my family is that half of my siblings are small business owners and one of the statistics that you'll hear that's often cited is that half of all small businesses fail within the first five years being a good statistician that means 2.5 of my siblings are going to go out of business and so to me it's really a personal mission to be able to apply my skill set and build better products for our customers so that they go beyond surviving to thriving and then prospering in the future but it's a very complicated problem to solve right personal finance is something that flummoxed me as a phd student when you add in the complexity of a small business all of the things they have to worry about can i make payroll do i have enough cash flow can i pay my bills and this is a problem that doesn't really go away just simply because you have a lot of data and you can't do this on your laptop you need massive amounts of compute and that's why about six years ago the company we declared that we were going to move into the cloud and begin this transformation now for me as someone who just joined i didn't really know the reality of the situation you think you snap your finger and boom you're in the cloud i'm done but it's actually a really long journey right because the reality of the situation is that you're moving from working on your laptop or an on-premise data center to completely changing your workflow in the cloud and you don't just move your data you move your applications you move your services and you need to make sure that the cloud environment you're in is fully secure as well and so what i want to share with you are some of the lessons that i personally learned as well as what my team learned when we went through this cloud migration journey over the past six years and i hope that regardless of where where you are in your journey you find this useful so i think the top question on everyone's mind is how do you choose from all the options right at this point cloud adoption is mainstream and the competition is fierce these are the top six cloud providers two years ago there were only three major cloud providers and every year there's a new entrant into the market and if you take a look at what these cloud providers offer it's really hard to distinguish they offer a whole bunch of stuff right but why do you as a data scientist need to care which cloud provider your institution goes to because regardless of whether or not you're an academia industry healthcare finance you are going to move to the cloud stanford university is moving itself to the cloud as we speak right now and so my position is as a data scientist the reason why you need to care about which cloud provider and what they offer is because of the ai services these cloud providers are increasingly releasing right and this is where differentiation happens here are just some of the major services that these cloud providers offer species text transcription machine translation natural language processing conversational agents and for me i didn't really appreciate the fact that these ai services can actually make a difference in how quickly you can ship a product until i myself went through it about two and a half years ago we were looking to build a chatbot to be placed in our turbo tax product because i can tell you people don't like doing taxes but people really hate calling in for help right no one loves picking up the phone and saying hey comcast fix my internet and so what we wanted to do was actually build in a help agent in our products that can come up and help our customers resolve their issues without needing to call in but in order to do this we needed to have a conversational agent capability a natural language understanding service that was scalable extensible and was highly accurate and if you remember we're a financial services company and so we don't have this expertise so what we did was we went through and said do we build this or do we buy it and when we went there and looked at all the different cloud providers we realized actually they each provided something that was very much battle tested had been in production for other companies and we ended up using one from one of the major providers and that actually helped us leapfrog ourselves multiple years we were able to deliver a chat agent into turbo tax within just a couple months and build on top of it and so what i really think you should come away from this section understanding is that it really doesn't have to be one cloud fits all regardless of which cloud provider your organization is in take a look at the AI services that other cloud providers are offering because it really could actually expand your data science team beyond what it could do right now and construct the best solution for your organization as you go along and use these services second thing for me that was actually really difficult was embracing a new workflow i thought i had already been at the forefront of working in a new way right everything was done in a virtual machine i fully specified my dev environment if my laptop crashed i could bring it up within the same day and so for me when i thought about moving to the cloud it would be i would drop my laptop into the cloud and i just go on with my day but if you start with working in a laptop environment it's actually fairly limiting and you're not leveraging the full power that the cloud can offer because if the instance type that you're on runs out of memory you should be able to automatically pull up a couple more nodes distribute your workflow across that and then send your job out and that's not something you can do on a laptop right no matter if you're already using virtual machines right now on your laptop you cannot do that auto scaling and so you need to move to a cloud native tech stack and so for me the difference was when we moved to containers that allowed us to just build our code write our systems much more cleanly that actually allowed for better collaboration as well because it did not depend on the dev environment you're in i could send my container file to a co-worker and they can specify their dev environments on their machines and run the same code yet again and not just that it allowed you to scale and so if you can imagine right if you run out of compute you get to um sorry can i have some water apologize cool i'm back so if you run out of compute right and you need more memory if you're already working in a cloud native stack you can spawn your job across a hundred nodes and get that done at the same time by a kubernetes without changing anything about your configuration and that's where the real power lies in the cloud and once we got to that point it actually opened up a new door for us to be able to solve a problem that we didn't think we could solve previously now if you know anything about small businesses it's that every single small business is unique just like every single individual is unique and so when you want to solve the cash flow problem for small businesses and build out time series you can't aggregate all of their data build out one time series and then apply it to everyone and say here you go good to go it's simply cannot be done and so if you want to solve the cash flow problem for small businesses you need to be able to build an individual time series at runtime for every single small businesses and when you have four million customers that means you need to build four million simultaneous cash flow or time series algorithms and then ship that out within a certain limitation and when I was working on my laptop I can tell you when I pulled in a hundred thousand transactions it killed over and died I had to get a new one so when you think about a billion transactions for four million customers that's a scale that is just unimaginable unimaginable and it wasn't until we moved into the cloud changed the way we did our work that we could actually think about distributed training right a lot of us especially me think about distributed computation for just crunching your data in preparation for machine learning but what I recommend you think about are how to actually distribute your training job so that you can start moving to the era of building personalized models for every single one of your users because it is possible now and it's only possible if you move to a cloud native tech stack because the scaling that comes with it requires minimal code changes and so this is just a simplified architecture diagram of how we distribute our four million time series across our billion transactions and partition it into multiple jobs utilizing every single core within our 100 node cluster and all of this is done with the same container file no changes on that end and that's a really powerful thing to be able to do so I've talked about which cloud to choose and how to change your workflow but for me changing my mindset took a really long time so I lead an applied machine learning team so it's not just about me changing how I thought about costs and how I thought about where my job was it's imbuing that within my organization getting my team to change as well and so I think a lot of us are familiar with Costco it's where you buy in bulk and the analogy here is that when we think about computation in the old world we buy computation in bulk right you go and you pay some data center some amount of money and say I'm going to own these machines for the next three years if you use all of it great if you don't use it doesn't matter use it or lose it and that's a classical argument for why you should move to the cloud because on the cloud you just pay for what you use and for me in my everyday life when I go to the store I look at the price per ounce I don't look at the whole unit right and so I've been trained in my personal life to think of that but bridging that gap when we moved into our cloud provider and working in this way was an ordinary and ordinarily difficult to do and so why was that and that's because for a lot of us it's just those neural pathways you're trained to think about price per unit in one way and you don't think about it in your professional environment you've never had to do that before and the one example that really sticks out to me that I think about often and I actually think about this almost every single day was going back to the mint transaction categorization problem by this point we had already trained a new model we'd shipped it out into production but we were looking for slight performance gains and about three years ago was when we said okay let's take a look at some of the new deep learning architectures he said let's take two million transactions put that through a convolutional neural net and see what we get do we get the performance gains we're looking for that took three days and in those three days I went to get coffee I read some research papers I twiddled my thumbs I opened the terminal to take a look at it to see when it was done right that took a significant amount of time and that was something that was widely accepted at that time as normal when we moved into the cloud do you know how long that same job took it took one hour and that was because we could use the right machines and we can scale to the right GPU instance and so that made me rethink my approach and think actually you should really move to a decisions per minute framework rather than thinking about price per node right that's a factor of two boost in decisions within one hour we could say is this the right thing for us to do and move on the last point that I just want to touch on I know Margot is going to pull me off the stage too is something that Daphne touched on right you've heard me say a couple of things about working as a cloud native data scientist using containers using Kubernetes why do you as a data scientist need to care about that and my position is that it makes us all better data scientists if we understand the end to end flow and get involved in not just the engineering aspect but how do our results show up to our users we could not have pushed our engineering team to build distributed training for us if we didn't say that this is the only way that we can solve that customer problem for cash flow and so I really highly recommend that you don't think of yourself as someone to build algorithms and throw it over the wall be involved in the whole process so because of being in the cloud utilizing the AI services that exist changing our workflow to be cloud native we were able to solve some very difficult customer problems that are live today in the product and were unimaginable to me when I started six years ago to be able to solve and so regardless of whether you are at the beginning middle or at the end of your journey I hope some of these things have been helpful to you it's been an honor and a privilege to share this with you thank you thank you so much well from astrophysicists we're now working in the cloud which is actually not so strange probably we're now go to civil engineer working with water and it's my real pleasure to introduce my colleague from stanford and nusha ajami she is the leader of the urban water policy unit here on campus and she gave a wonderful talk just recently at with said stanford earth and i'm so happy you could make it here too nusha okay hi everyone a big shift from into it to water i i'm going to talk to you a little bit about oh okay i had another slide which is not showing here but i guess just just to let you know water scarcity is a huge issue in the world one of the biggest problems we are having and this problem is now going away that easily partly because the population is growing people moving to these urban areas we have competing environmental demands we have learned over time that we can just harness the water that that there is we have to leave some for the for the fish and everything else in the environment and also climate change is definitely exacerbating this problem on top of that we have aging infrastructure that's even further causing problem in this system so my slides are moving by themselves i'm not sure what's going on but you know they want to go fast i guess we are running out of time a little bit just and if you think about it for i'm sure in the past past year at least you have read a lot about droughts and water scarcity in different parts of the world and this is not a california u.s problem but it's actually a really growing problem in the world and um one thing i i what what i want to focus on today is we had a very specific mindset in the 20th century to deal with water uh we build centralized large infrastructure that was supposed to bring water to us and deal with access to water and a lot of this was actually did two things was very much prominent in this approach one was that we assumed there's abundance of water as soon as we run out of one source we can just go and tap to another one and another one was we assumed you will always have rain and snow to help us you know meet our future demand and the reality is if i don't know how many of you looked at the newspaper today but this has been a driest February ever in the in california and for 150 years right so we have not even received a drop of precipitation um and um okay so they started just to go away um anyway the driest year and that is actually a big issue um okay so the problem with this kind of approach which is top down we'll build infrastructure and people will come is that humans are not part of this process so as the environment we can't really tap into the environment forever the reality is we having ecosystems that are dying so we need to change this approach at least look at it differently the problem is we know this is not going to be sustainable but look this is this map shows the number of dams that are being built right now at this moment either under construction or plan to be built in the world and you can see that the whole centralized large infrastructure dam building process is not really coming to an end and it's actually growing um now if if this was a business and i was running it i would have said okay is the demand really there am i building this these dams for the right reason or is it just because this is how i was taught to deal with this issue this is a supply chain issue now i want to walk to you through the how water demand has changed over time this is the water demand for city of seattle the uh the black line shows how the demand has changed since the 1940s and all the dotted lines shows the forecast for water demand for the city of seattle one thing you should remember is that since the 40s city of seattle's population has grown has doubled actually it was about 300 000 people now it's more than 700 000 people and look at the water demand look at the black line and what you see is even though the population has more than doubled the water demand hasn't changed that much we have uh really progressed in the way we use water we have technologies that are using a lot that help us to use less water different fixtures different rules and actually regulations that were put in place so water demand is not really going up even though population is growing and our economic wealth and wellbeing is growing and if this was a business that i was running and i was so off the whole time predicting what's coming next i would have invested in all sort of wrong solutions right the next big this the next big that because i was expecting this water used to be doubled or tripled over time this is not a seattle issue this is actually very common all across the us and i put some examples here for you you often hear southern california and los angeles as a way of when people talk about water but actually very similar story their populations had doubled but their water use hasn't changed at all so this means that actually that approach that i'm going to go bring water to people as the population growth is not really a sustainable way of thinking about water and we have to incorporate this human dimension and human dynamics into the decision-making process and also need to consider this feedback loop in our decision-making okay so my team actually works on trying to to better understand this human dynamics and better understand how people's water use is changing over time and how that would eventually impact water infrastructure or infrastructure planning in the long run um i'm going to bring you i'm going to talk about one example here um large landscape irrigation actually and how people are using it so this is a map of green grass the amount of grass we have in the us lawns in the us so you might probably have not had guests but we the the biggest crop we are growing in the us is lawns okay last time i remember i ate lawns was i can't even remember so we don't even need them we just like them because it's leisurely it's beautiful we like to look outside and see how nice it is but the reality is they use four times more water than corn okay and they use and we are growing them uh everywhere and and the western us is actually um which is a very dry region it's it's sort of trying very hard to use and come up with enough water to make this happen but actually i was having a conversation with someone from michigan that were telling me exactly the same thing so i guess it's not a western us problem at all um so i'm going to bring you down to california and then cal in california the actually 50 percent of our water is used in outdoor spaces about 34 percent of that in the household how a residential building but 10 percent of that water is used for landscape irrigation and landscape irrigation large landscape irrigation is basically is um all the malls and institutional buildings and if you live in a large hoa building all the grass that you they maintain for you that's how much that's those are the spaces we have we call large landscape and for california this is about 0.9 million acre feet per year now what does that mean that is the amount of water two million households not people households can use for one year we use that much water to maintain all these manicured beautiful grass okay so if we can shave off some of that water that can go a long way right so we want we were very interested to see how such a what's kind of conservation behavior do we see in this kind of spaces and is there a way we can sort of use that as a way of planning for future infrastructure this the story i'm going to tell you is going to be from the city of redwood city right off north of stanford and they're actually the reason we work with them is because they have smart meters which collect data every 15 minutes they send data to people and and also they collect that data information on water use per per meter and very interestingly enough they also have invested in recycling so they have a recycling plant that provides the recycles the water and that water is provided to some of their customers for their outdoor water use and remember we were talking about large landscapes right so for the outdoor water use so we had a very nice nice experiment without us necessarily needing to select people they have a group of people that receive water from the tap for portable water and we have another group of people that receive water from from recycling plant and these two groups actually during the recent drought in california how many of you remember we had a huge drought in california just recently right so this is from 2013 2016 and and it was a it was a very severe drought and a lot of people especially outdoor water use was under restriction there were there was a lot of effort to try to make people use less water and people who actually in this specific experiment that we have people who had access to recycling water recycled water they were not under restriction they were actually encouraged to use water because this was an infrastructure that they missed the city invested in it's expensive to run it they want to sell this product again remember water is a product that's being sold so they want people to buy that product because it's expensive to generate that recycled water however the portable water that was coming from natural sources it was under restriction now interestingly enough these people were under the same different regimes or restrictions however they were receiving the same kind of information overall actually fanny gave a very interesting talk earlier this morning which talked about knowledge and how much information we gain and how do we make decisions we actually during that period in california we there was a lot of media coverage of the drought in california it was a period of calm tons of articles were being written on this issue we actually developed the search algorithm that basically scraped the web for all the articles that have been written called articulate and what you see here actually the blue line shows the google searches how many people went online and searched is the california in drought is the drought over water conservation different different search terms and the and the red line shows the shows the the number of articles that was written and you actually saw that little red and blue on the bottom in the previous slide to that basically shows if california if it's in red it's in a drought if it's in blue if it's it's in a wet year or a normal year so these people were under the same regime different regimes for water restrictions but they were actually under the same amount of information bombardment from the media and interestingly enough if you look at the one the the the bar charts they're below zero because they're conserving water so they're using less water so the blue ones are the people who were using portable water and the purple ones are people who were using recycled water and what you see is even though the recycled water users were encouraged to use more water they still saved right they did not use the the same amount of water they used to because they were actually receiving this information that you're in a drought we should change our behavior in another interesting way of looking at this was basically looking at neighborhood norms again this is a red city of redwood city and you can see the the group in the bottom on the left red part are people who mostly receive portable water for their outdoor space and the people in the corner top top left are the ones who are receiving recycling water and and those the the getters or statistics basically shows how people are changing their behavior based on their neighborhood norms and here what you are seeing is recycled water people are also saving and not only they're saving over time this this movement is growing right now they are blue versus red because port remember portable water people were under restriction to save to use less water right so they were saving a lot more but even the blue shows even though they were not under restriction they were still saving water we could not have done this even if we did not have access to all that data and haven't done a lot of work in matching different the different the you know buildings and different units to diff to the meters to the data to all these different pieces this would not have impossible and I really appreciate the the earlier talk the fact that you know Stanford's parallel computing and cluster computing was extremely useful for making these things happen quickly so as you see as the years go by more people start saving and this neighborhood norms grows the last thing about this talk that I wanted to give that about this study that I want to talk to you about is about how income had impact in the way people saved water and how they behavior changed over time so 2014 was very dry and the the blue line that you see shows the trend in water conservation you see that people were saving but not that much by 2014 we had a lot of restriction in place on how much water people should use so you see there's a drop in water use in that line and then by 2016 some of the restriction were lifted however and however some people in a lower income communities or some of the you know less high income people continued saving water however you see in the affluence neighborhoods the water use started going back up that's why you see there's a there's a slope in that in that blue line at the end okay so this was a this was an example to show you what one thing I would say is when you go back and look at these people you see some of these water use never comes back because people go replace their lawns by actually native landscaping or they just get rid of all their lawns and do other things with their space and that means that they're structurally changing the way they use water okay so that means that we instead of building a recycling plan and asking people to use water maybe we should actually more closely look how we can make them use less water so basically moving from supply side management to demand side management which is more driven by how humans make decisions and how those decisions can impact our long-term sustainability water scarcity is not going to go away it's one of the world's most pressing problems and I know margo and I were talking yesterday and she wanted me to talk about how these kind of things impacts people and I would say the biggest the community disadvantage communities and underserved communities are the ones that are most impacted by these decisions because as we build more large centralized infrastructure the cost of water goes up so somebody has to pay for that so everybody needs to pay for that and also on top of that there are the one who at the end end up having problems with access to clean water with that I hope I'm here I'm happy to help to answer any other questions you may have but I appreciate being here thank you so much you sir sorry about this is such a wonderful example how data analysis can help with understanding such an important topic as water usage and help also set policies so thanks for shining a light on that and I don't think we've ever had problems with slides before so we'll figure out what happened but do data analysis on it it's time for another caffeine break and bio break just wanted to ask you to please be back here just before 3 30 we're starting exactly at 3 30 and hope you have an enjoyable break live from stanford university it's the cube covering stanford women in data science 2020 brought to you by silicon angle media and welcome to the cube i'm your host sonia tigari and we're live at stanford university covering the fifth annual wids women in data science conference joining us today is emily glasberg sands the head of data science at corsair emily welcome to the cube thanks so great to be on so tell us a little bit more about what you do at corsair yeah absolutely so corsair is the world's largest platform for higher education we partner with about 160 universities and 20 industry partners and we provide top learning content from data science to child nutrition to about 50 million learners around the world um i lead the end to end data team so spanning data engineering data science and machine learning wow um and and we just had daphne color on earlier this morning who's the co-founder of corsair and she's also the one who hired you yeah so tell us more about that relationship well i i love daphne i think the world of her um as i will talk about shortly she actually didn't hire me from the start the first dancer i got one from corsair was a no that the company wasn't quite ready for someone who who wasn't a full blown coder but i eventually talked her in to bring me on board and she's been an inspiration ever since i think one of my first memories of daphne um was when she was painting the vision of what's possible with online education and she said um think about the first movie the first movie was literally just filming a play on stage you'll appreciate this given your background in film and then fast forward to today and think about what's possible in movies that could never be possible on the brick and mortar stage and the analog she was um creating was the first mook the first massive open online course was very simply filming a professor in a classroom um but she was thinking forward to today and tomorrow and five years from now and what's possible in terms of how data and technology can transform how educators teach and how learners learn that's very cool um so how has corsair a change from when she started it to now so it's evolved a lot um so i've been at corsair about six years uh when i joined the company had less than 50 people today we're 10 times that size we have 500 i think there've been um obviously dramatic growth in the platform overall but three main changes to our business model um the first is we've moved from partnering exclusively with universities to recognizing that actually a lot of the most important education for folks in the labor market is being taught within companies so google is super incentivized to train people in google cloud amazon in aws folks need to learn tableau and a whole host of other softwares so we've expanded to including education that's provided not just by top institutions like stanford but also by top institutions that are companies like amazon and google um the second big change is we've recognized that while for many learners an individual course or a mook is sufficient some learners need access to full degree diploma bearing credentials so uh we've moved to the degree space we now have 14 degrees live on the platform masters in computer science and data science but also in business accounting and so on um and the third major changes i think just sort of as the world has evolved to recognize that folks need to be learning throughout their lives there's also general consensus that it's not just on the individuals to learn but also on their companies to train them and governments as well and so we launched corsair for enterprise which is about providing learning content through employers and through governments so we can reach a wider swath of individuals who might not be able to afford it themselves and how are you able to use data science to track individual um user preferences and user behavior yeah that's a great question so you can imagine right 50 million learners they're from almost every country in the world they're from a range of different backgrounds have a bunch of different goals and so i think what you're getting at is that so much of creating the right learning experience for each person is about personalizing that experience um and we personalized throughout the learner journey so in discovery uh upfront when you first join the platform we ask you what's your career goal what role are you in today and then we help you find the right content to close the gap as you're moving through courses we predict whether or not you need some additional support whether it's a fully automated intervention like a behavioral nudge emphasizing growth mindset or a pedagogical nudge like recommending the right review material and provide it to you and then um we also do the same to accelerate support staff on campus so we identify for each individual what type of human touch might they need and we serve up to support staff recommendations for who they should reach out to whether it's a counselor reaching out to a degree student who hasn't logged in for a while or a ta reaching out to a degree student who's struggling with an assignment so data really powers all of that understanding someone's goals their backgrounds the content that's going to close the gap as well as understanding where they need additional support and what type of help we can provide and how are you able to track this data are using ab testing yeah great question so um the uh we call it a venting level data which basically tracks what every learner is doing as they're moving through the platform um and then we use ab testing to understand the influence of kind of our big features so say we roll out a new search ranking algorithm or a new learning experience we would ab test that yes to understand how learners in the new variant compare to learners in the old variant um but for many of our machine learn systems we're actually doing more of a multi-arm banded approach where on the margin we're changing a little bit the experience people have to understand what effect that has on their downstream behavior separate from this mass hold in or hold out ab test and so today you're giving a talk about uh corsair's latest data data products give us a little insight about that so i'm covering three data products that we've launched over the last couple of years the first two are oriented around really helping learners be successful in the learning experience so the first is predicting when learners are going to need additional nudges and intervening in fully automated ways to get them back on track the second is about identifying learners who need human support and serving up really um easily interpretable insights to support staff so they can reach out to the right learner with the right help and then the third is a little bit different it's about once learners are out in the labor market how can they credibly signal what they know so that they can be rewarded um for that learning on the job and this is a product called skill scoring where we're actually measuring what skills each learner has up to what level so i can for example compare that to the skills required in my target career or show it to my employer so i can be rewarded for what i know and that can be really helpful when people are creating resumes by by by ranking how much of a skill that they have absolutely so it's really interesting when you when you talk about resumes so many of what so much of what's shown on resumes are traditional credentials things like what school did you go to what did you major in what jobs have you had and as you and i both know there's unequal access to the school you go to or the early jobs you get and so part of the motivation behind skill scoring is to create more equitable or fair or accessible signals for the labor market so we're really excited about that direction and do you think companies are taking that into consideration when they're hiring people who say have like a five out of five skill and computer science but they didn't go to stanford yeah they're taking that absolutely i think companies are hungry to find more diverse talent and the biggest challenge is when you look at people from diverse backgrounds it's hard to know who has what skills and so skill scoring provides a really valuable input we're actually seeing it in use already by many of our enterprise customers who are using it to identify who of their internal employees is well positioned for new opportunities or new roles for example i may have a bunch of back-end engineers if i know who's good in math and machine learning and statistics i can actually tap those folks to transition over to machine learning roles and so it's used both as an external signal and external labor market as well as an internal signal within companies and and just our last question here um what advice would you give to young women who are either out of college or just starting college who are interested in data science who maybe you know don't haven't majored in a typical data science major what advice would you give to them so i love that you asked who haven't made it majored in a typical data science major i'm actually an economist by training and and i think that's probably the reason why i was at first rejected from course era because an economist is a very strange background to go into data science um i think my my primary advice to those young women would be to really not get too lost in the data science in the math in the algorithms and instead to remember that those are a means to an end and the end is impact so think about the problems in the world that you care about for me it's education for others it's health care or personal finance or a range of other issues and remember that data science provides this vast set of tools that you can use to solve the problems you care about most that's great thank you so much for being on the cube thank you i'm sonia tigari thank you so much for watching the cube and stay tuned for more from stanford university it's the cube covering stanford women in data science 2020 brought to you by silicon angle media hi and welcome to the cube i'm your host sonia tigari and we're live at stanford university covering the fifth annual wids women in data science conference joining us today is nisha jami who's the director of urban water policy for stanford nisha welcome to the cube thank you for having me absolutely so tell us a little bit about your role so i direct this around water policy program at stanford we focus on building solutions for resilient cities we try to use data science and also the mathematical models to better understand how water use is changing and how we can build future cities and infrastructure to address the needs of the people in the u.s in california and across the world that's great and you're going to give a talk today about how to build water security using big data so give us a preview of your talk sure so the 20th century water infrastructure model was very much of a top down model so we built solutions or infrastructure to bring water to people but people were not part of the loop they were not the way that they behaved their decision making process what they use how they use it wasn't necessarily part of the process and we assume there's enough water out there to bring water to people and they can do whatever they want with it so what we are trying to do is you want to change this paradigm and try to make it more bottom up to engage people's decision making process and the uncertainty associated with that as part of the infrastructure planning process and so i'll be talking i'll talk a little bit about that today and where is the most water usage coming from so interestingly enough in developed world especially in the in the western united states 50 percent of our water is used outdoors for grass and outdoor spacing which we don't necessarily are dependent on our lives don't depend on it and i i'll talk about the statistics in my talk but grass is the biggest crop you're growing in the u.s while you're not really needing it for food consumption and also uses four times more water than than the corn which is which is a lot of water and in california alone if you just think about some of the spaces that we have the grass or green spaces we have outdoors in the in the in these malls or institutional buildings or different outdoor spaces we have some of that water if we can save it they can provide water for about million or two million people a year so that's a lot of water that we can be able to we can save and use or are actually repurpose for needs that we really have so does that also um boil down to like people of watering their own lawns or is it the problem for much bigger grass usage actually interestingly enough that's only 10 percent of our water outdoor water use the rest of it is actually the residential water use which is what you and i the grass you and i have in our backyard and watering it so that water is even more than that amount that i mentioned so we use a lot of water outdoors and again some of these green spaces are important for community building for making sure everybody has access to green spaces and people kids can play soccer or play outdoors but really our individual lawns and outdoor spaces if there are not really a native you know landscaping it's not something that we use enough to justify the amount of water you use for that purpose so taking longer showers and all the stuff is is very minimal compared to no not at all actually those are also very very important that's another 50 percent of our water that we use in our urban areas um it is important to be mindful the way we wash dishes the way we take shower the way we brush our teeth and not wasting water while you're doing that and a lot of other individual decisions that we make that can impact our water use on a daily basis right um so so tell us a little bit more about right now in california we just had a dry february which is the first in 150 years and you know this is a huge issue for cities agriculture and for potential wildfires um so tell us about your opinion about that so um the the 20th century is infrastructure model i mentioned at the beginning one of the flaws in that system is that it assumes um that we will have enough snow in the mountains that would melt during the spring and summertime and would provide us water the problem is climate change has really really impacted that assumption and now you're not getting as much snow which is comes back to the fact that this february we have not received any snow we are still in the winter and we have spring weather and we don't really have much snow on the mountain which means that's going to impact the amount of water we have for summer and spring time this year we had a great last year we got enough water in our reservoirs which means that we can potentially make it through um but then you have consecutive years that are dry and we don't receive a lot of water precipitation in form of snow or rain that will become a very problematic issue uh to meet future water demand in california and do you think this issue is um along with not having enough rainfall but also about how we store water or do you think there should be a change in that policy sure i think that is definitely has something also in the way we store water we definitely we are in the 21st century we have different problems and challenges it's good to think about alternative ways of storing water including using groundwater sources groundwater as a way of storing excess water or moving water around faster and making sure we use every drop of water that falls on the ground and uh also protecting our water supplies from contamination or pollution and do you see us ever going to desalination or to get clean water so interestingly enough i think desalination definitely has worked in other parts of the world when they have when you have smaller population or you have already tapped out of all the other options that are available to you desalination is an expensive solution uh costs a lot of money to build this infrastructure and also again depends on uh you know this centralized approach that we will build something and provide resources to people from from that location so it's very costly to build this kind of solutions i think for for california we still have plenty of water that we can save and uh repurpose i would say and also we still can do recycling and reuse we can capture our storm water and reuse it so there's so many other cheaper more accessible options available before we go ahead and build a desalination plant and you're going to be talking about sustainable um water resource management so tell us a little bit more about that too so sustainable water resource management and occasionally i use also the word word like building resilient water future it's all about diversifying our water supply and being mindful of how we use our water every drop of water that we use it's degraded and needs to be cleaned up and put back in the environment so it always starts from the bottom the more you save the less impact you have on the environment the second thing is you want to make sure every drop of water that we use we can use it as many times possible and not make it not not take it use it lose it's right away but actually be able to use it multiple times for different purposes another point that's very important is actually majority of the water that we use on a daily basis is doesn't need to be extremely clean drinking water quality for example if you tell someone that we are flushing down to our toilets drinkable water it would surprise you that we will spend this much time and resources and money and welcome back to the afternoon session for wits and i just want to say how amazing it is to see all of you here still and to hear the level of the discussion so everybody's still awake and everybody's still here and it's 3 30 we've been going the whole day but it is just absolutely wonderful to see all your energy and and your enthusiasm now before we start with the career panel I just wanted to introduce you and hopefully they're in the room right now otherwise I'll just talk to them about them I mean the two artists that we have Andrea and Patricia are you in the room if so get up they may actually be putting their stuff together but let me tell you a little bit about it we really like this interface of art and data science and so like last year when we had two artists we asked two artists working in that space to come and exhibit and I will tell you again just before the reception but we have Patricia Alessandrini from Stanford and she's going to give a demo the use of artificial intelligence in music and she will do that in the living room at the piano at the start of the reception and we have Andrea Gagliano and she's from Getty Images and she's a machine learning expert who works on on visualization and images yesterday we talked in the podcast and she said that she does things like detecting super cheesy smiles which I thought was amazing so that's scsd the acronym super cheesy smile detection so apparently that is something you can do in machine learning but she has an amazing exhibit of 16 beautiful portraits and you'll be surprised that this I won't tell you what this is about just go and see it later at reception but now I want to turn it over to Martina and you're going to take all these amazing guests here through the career panel so thanks very much for that Martina Loutenko you've heard from astrophysicists and statisticians you're about to hear from an economist and this is the part of the day where we focus on you and the difference that you can make in your choices about your career and how you direct your talents to make the world a better place by way of example of the amazing people that are here on stage with me and you have their bios inside your program I encourage you to read them because they will blow you away so I will just introduce them individually so you know who's who is who we're speaking but immediately to my right you already know to let the Williams who's an associate dean and associate professor of mathematics at harby mud next to her is Rukmini Ayer who is a distinguished engineer at microsoft for the bing and the research and ai groups next to her is Denise Ross who is at the national conference for sit on citizenship and also senior fellow at georgetown and next to her is Lillian Carquios who is an insights manager at spotify as well as a data scientist so you can imagine everyone in here is asking themselves all right academic pedigree we have a bunch of phd's and master's degrees up here on stage is it is it really deterministic in the path that I choose and I'm just curious how many people on stage today when you got your various degrees knew or had a sense of what you'd be doing today I did okay so so half of our group did all right so maybe we'll start with you Rukmini since you had a sense yeah you started off with civil engineering but then you pursued a phd and electoral engineering yes and chose to stay in industry so tell us how you made that decision and how you wind up on your path to worry right so I started my first year in civil engineering and it's because I read Anne Rans fountain head and that was all it took and at the end of the first year the counselor told me that hey your math and physics grades are really great but you barely passing your engineering drawing class and so he really recommended I do computer science or computer engineering then and I really didn't want to sit in front of computer so I said okay what's the worst you know among the whole slate of degrees and I said I'll be an electrical engineer because it was still engineering and computers didn't feel like engineering uh so I went into electrical engineering third year in like a fourth year in electrical engineering I did my bachelor's project and when I was doing that project I realized I did a project in speech recognition and I saw dynamic programming for the first time and I fell in love and I knew from then that I was going to work in this field and had my civil engineering self read a little bit more I would have maybe figured it out earlier but I actually think I'm very resilient because I keep moving around I choose you know I choose things because I'm interested in them and I usually like to solve problems so I end up you know veering towards where the biggest problem is and so from that perspective I think all these changes have just made me more resilient and made me appreciate the diversity of people that you need to bring together when you're really trying to solve something hard and to Lizzie you did not put up your hand but you also pursued a phd in statistics so how did you wind up academia tends to be more deterministic so at what point did you know that that was the path you wanted to be on such a great question so um I went to spellman college for undergrad which is a historically black college for women in Atlanta and so it was very empowering to be taught by black women who had phd's in all these different areas and so I started in a phd program in mathematics and took a biostatistics elective and we were looking at a data set of mothers some of whom had smoked during gestation and some who had not this is looking at historical data and did like a linear regression and I'm just like why are we first of all why are women smoking during pregnancy like everybody knows and so we're looking at this data and you know clearly for women who spoke during pregnancy they had shorter gestations and lighter birth weight babies and you know professor said so you know what's your conclusion and we're just like duh smoking you know is harmful to your baby and then he talked about how the tobacco industry refuted the data when it first came out like oh no it's not smoking it's mothers as ethnicity it's background and I just remember sitting there thinking like look at the data like the only difference in these women is whether or not they smoke and so that was the moment where I was like I'm going to be the one who looks at data and pulls information against the man so yeah that was sort of the moment and then I switched phd programs I then applied to stats phd programs left my phd program in math and went on into statistics and academia is a choice versus you work me in each of those industries because academia was there a reason why you thought I want to do that in academia as opposed to in industry right summer's off spring break I want to have big kids and what are their schedule to match mama's that was pretty much that is an excellent I started in research so I was in research labs but I quickly veered towards products and I really like solving product problems and I'm not such a huge fan of publishing so I ended up you know I just veered in that direction and then I found success there too so yeah I think each his own path it was really definitely reinforcing yeah so Lillian for you you mentioned that you very intentionally wanted to get a master's degree and very intentionally went to a liberal arts college despite being STEM oriented so tell us a little bit about how you made that choice so I when I was a teenager and going into deciding a college I absolutely knew that I wanted to study math that was like not even a question for most people that knew me because I just had a deep curiosity and it had really become part of like how I looked at problems and thought about things and I so when I was looking at colleges Smith College to me like really was a special place to me beyond it being a women's college which is really meaningful when you're talking about being a math or engineering student there was also a place where I could be challenged to think about things differently so I was very comfortable in my math space but I knew that I wanted to be somewhere where it was hard and challenging and intellectually challenging so I you know I had an undeclared minor in Latin American studies because I just took so many courses in that area I took sociology courses I took neuroscience for non-experts like film courses logic class so I just got exposed to a lot of different ways of thinking and I just found that to be extremely necessary for me to also then be able to go deeply into math something about that balance really helped me and I raised my hand earlier about deterministic paths because at the point where I was going into grad school I'd been at a school for a couple years and decided to do a master's this is before there were any data science programs anywhere for a master's so what I did was I did this program called an industrial math degree which they offered at WPI in Massachusetts and it's basically an an applied math degree but you focus on an area in industry so it's for folks who want to go into grad school in order to go back into industry rather than PhD and I was able to add a machine learning track to my degree and so I knew that I wanted to work in technology with data and understand analytics through machine learning and I was able to do that through this program and Denise you're the only one on stage that has chosen government and service as your professional path and I'm curious if you observe within that field how much does academic credentials or pedigree matter and did it play a consequential role for you in the path that you chose yeah that's a really good question so I spent a few years in the Obama White House and I remember when I would be introduced to people it wasn't any credentials that I necessarily had it was the fact that I've come out of the city of New Orleans so when someone would introduce me this is Denise Ross which used to work at the city of New Orleans so I had this authenticity because I was in New Orleans you know I was just sort of doing the heads down doing the hard work of democratizing data when Katrina happened and then we had a really important mission because the federal levies had failed 80 percent of the city had flooded and we were flying blind we couldn't make basic decisions about like what child care centers what should we reopen first and you know where should we put up the health clinics and which parks which and great playground should we rehab first so you know my my role was with the team was to help rebuild the data as we rebuilt the city what's interesting last night there were a bunch of us that got together from the conference for dinner and one of the topics that came up at dinner was the view of data science from different vantage points and in Silicon Valley it's highly revered it is the it is how the future that all of us were talking about all this AI it's coming from data science but that in industry and traditional industries that they don't quite get the role of data science yet and one of the things that you were describing there is okay the levies failed and how do we use data to help us make and shape and perform these decisions so especially government nonprofits all of that they tend to be less forward-leaning and they're understanding of technology so what is your observation of how how can data science stand itself up in those more traditional industries and be seen for the potential that it truly has yeah well I was really fortunate because I moved into local government when the the federal government had already set a precedent so Obama had really leaned forward in data transparency and public engagement and so at the local level there was this incredible optimism because what we had in New Orleans after the storm was an asymmetry of information so we couldn't the problem was bigger than government and it was bigger than the people and we had to align efforts and in order to be able to solve the problem and what government had to do was was release the data that only it had but their their information systems were in shambles and so it would really best be described as a pathological complexity inside inside of government but what was amazing is that I there was this incredible talent was latent talent that just hadn't had the opportunity to really rise to their fullest potential so when you're working in government it's really about finding the talent that already exists and helping them be the best you know and best civil servants they can be so to green a year and a massive organization yes and finding talents as we're speaking of finding talent and developing it yes where that is extremely important and how do you find that talent how do you interview for it and select it and assess it especially when I'm sure Microsoft gets like the gold-plated resumes around the work from around the world yes it does we are hiring and if you're in Microsoft research they get you I think what is the diamond-plated resumes too so we hire across the spectrum both industry veterans as well as students coming out of their masters and PhD programs and the traditional interviews I think a lot of you here must be Silicon Valley veterans you can regurgitate textbooks there are those type of interviews there the puzzle solving interviews which gives you a migraine at the end of it so I'm sure Microsoft has those types of interviews too typically my interview I ask more open-ended questions and I'm really looking for people who care more they don't just care about what they build they care about who uses it and so people who ask me questions about hey who's your consumer and what kind of challenges do your consumers face with your products I'm really interested in those people because they really care about the end-to-end use of technology versus just hey I built this really large neural network with this open source software and it does 95% accuracy I mean so I really want to see more than that so that's one and the second type of people that I really like are people who question a lot like you know even if I say oh we have really large models they ask details what do you mean by a large model aren't you overfitting on data they ask questions that tell me that they're thinking very deeply about machine learning and data science and it's not just you know terms and phrases that you use and I'll add as someone that doesn't interview candidates for data science but interviews I probably do two or three interviews a week just that questioning mindset where you asking questions and framing and probing to understand is just as important as the answers that you come up with yes um Lillian you talked about the fact these you're an insights manager now where you're basically trying to have the user at the center and take all of these inputs and figure out well how does that all add up to an improved experience more personalized experience so tell us a little bit how those teams get constructed yeah and what you think is the most positive dynamic and how you use all those levers together yeah so it's Spotify it's extremely collaborative and there are lots of very deep thinkers who are experts in their little corners and it takes a lot of talent on top of that to be somebody that can bridge those experts together and so I'm trying something new where I really care about human centered design in technology and I really care about human centered evaluation of AI and ML products so I'm trying to drive that forward there are lots of other people who care too but we haven't had an organized front together so what we're trying to do is really think about the different pieces that different teams really care about so the MLEs might care a lot about performance machine learning engineers excuse me might care a lot about performance of a model and might care about precision and accuracy the product managers who are the basically the business thinkers and an engineering team might really care about the final business metric is this optimizing for the thing that we're trying to improve that I promised my boss I was going to do that he promised his boss he was going to do so all of that and and then the research scientists are a different group that are interested in the horizon of the state-of-the-art research in this area and so my team is an insights team we have mixed methods so we have user researchers who do lots of foundation on qualitative research one-on-one with users or at scale for through surveys or other research and we have data scientists who dig deep into the data create metrics and try to bridge sort of the data to how humans are actually behaving and so our team is trying to be the the bridge along all of that so that we can take that state-of-the-art horizon thinking and put it into the product and make it feel so that everyone feels like they're winning by being able to tell the story from each of their perspectives in a way that makes sense and in a way that makes them feel like they're having an impact in a positive way so that's the approach that I'm taking and the idea is definitely put the human in the center and figure out how to translate that for each of the different types of experts that we're collaborating with. I think that really reinforces Suje's point earlier about AI requires really high EQ and that's something that all of us need to take away and however we apply our jobs and our work I wanted to spend a little bit of time talking about advocacy versus mentorship and the distinction that I'm drawing there is mentorship as an expert that is sharing their expertise with you so that you can learn and advocacy is somebody that is ahead of you on their career path that actively advocates on your behalf to help you with your career and Talitha you'd mentioned that as you were going up that you were seeing these really cool black women that were professors did you have examples where people advocated on your behalf that really made a difference in your career? Absolutely I mean I think that definitely in grad school there were times where I was ready to quit and my department chair who ended up becoming my advisor was a huge advocate in terms of me staying in the program but also you know helping to communicate my love for statistics in spite of the ways that I felt like I was not doing well and when she came back to me she was like people don't view you the way you view yourself right and so I was sort of really hard on myself but then also you know going out on the job market and being able to make those phone calls and say I've got a great student I want you to think about her for your position has been fantastic and then just having mentors outside so now in my professional life when I think about what the next step is for me I look at folks in those positions and I'm you know really sort of putting myself out there and asking them you know ways that I can be invited to the table it's hard when you're not there and especially when there aren't a lot of people who look like you who are already at the table and so many of my advocates are majority men right white men who you know really want to see change in higher education leadership who are really saying you know we need to bring different voices to the table because it's hard on the outside to say that but you need someone on the inside who's pushing that so I'm curious for the rest of the panel how many of you had primarily white men as your advocates it was a mix for me so I can think of two real pivotal sponsorships that I that I was benefit that I benefited from and one was when I was at the city of New Orleans this was after I came back from maternity leave I had twins and my boss says what do you want to do when you grow up I said I really want to work on climate change and about six months later the white house called and they wanted to they wanted someone to talk to to talk on stage at the launch of the climate data initiative and Alan handed the phone to me he was like the white house called you know why don't you take it which is sort of the most awesome boss move right that you he's a black man and he you know he just and he gets the best boss award and then the second time I benefited was when I was at the white house I was finishing up my presidential innovation fellowship and my colleague Clarence Wardell and I had co-founded this project called the police data initiative it's sort of a side thing but it was after Ferguson and we both thought well what you know the the national dialogue is there's a lot of people wondering like what is is police violence getting better is it getting worse is it better in some places worse than others we had no data to inform the national discussion and so Clarence and I had this idea like well what if what if police departments released the data they had on use of force maybe that would change the national dialogue and so we had this little side project that turned into one of the most tangible responses to the president's 24th century police and task force but our fellowship was running out and so DJ Patel who's um co-coined the term data you know data scientist um he put Clarence and my name on a post-it note on his monitor and he worked the building to try to figure out a way to get us badged into the White House so we could you know finish out Obama's term doing this police in work and interestingly Clarence and I ended up in different very different roles but um DJ found a spot for both of us so we could still do the police in work that's an amazing story all right so take note everybody take some post-it notes keep them on your desk for who you need to help along yeah I should say that my um I've had also like like male mentors very strong mentors and some who have kept me in my career you know when I had my child and I came back at some point I was really overwhelmed and I was thinking of quitting and my manager at that time said you know there's something called lie low I know you haven't heard that term but you could lie a little low you know that really you know helped me write my the right those couple of years the initial couple of years but I guess the biggest mentor I have is my PhD advisor she was a rare woman in the field of electrical engineering working in AI and she was doing excellent work and she was a role model and a very she had very strong expectations of her students so she taught us to present she taught us to write um she you know she advocated for us she was she was a mentor teacher advocate all of it put together but so I think she would be my strongest mentor um but there have been a host of allies all the way through Microsoft she sounds amazing and I would say for those of you that are at the point in your career where you can be what the what Rukmini's PhD advisor is take a page from that playbook that is an extraordinary impact yeah Lillian I know you and Rukmini are working on the Woods High School program yeah yeah so I just want to say that Rukmini is one of my angels so she connected me to the Woods um outreach program for high schoolers and then to this panel so thank you um yeah so we're you saw me for two seconds on that video this morning um and apparently many high schoolers are going to be hearing about a day in my life um it was secretly two days because I got sick halfway through the first one so but um I'm really excited to share just a different you know one of the many different ways that you can be a data scientist today um and just how different everything is and I did I think I made some jokes about spending too much time in meetings because one of the things that I want young people to know is like how important collaboration and being able to talk about data science to non data scientists really is uh so that was sort of the the part that I tried to highlight um I just think it's really important that we be able to tell the stories of what we're working on to non-experts imposter syndrome gotta talk about it and I'm just curious for the people in this room how many people here including the folks on the stage so there's everyone has doubts and fears but have genuinely felt like an imposter put your hand up all right so I do just want to make a note pay attention to that next time you're in a challenging conversation or a hard meeting and remember what you just saw which was at least half maybe two-thirds of the rooms hands went up about how many people might look really confident and be acting like they're all that in the meeting but on the inside are feeling a little tender it's one of the things that helps us collaborate better is to remember that about one another so three of my panelists had their hands up so let's talk about you see I'm gonna get on my soapbox for a second here there's a difference between being humble and still having low confidence because you're new and you're still learning and having imposter syndrome I am I am very lucky that's something inside of me and I'm I'm sure it comes from my Boricua roots that has always been like okay I have something to say and I'm gonna say it and even if I'm wrong I'm gonna learn from whoever else corrects me but imposter syndrome isn't something that I've ever really felt in the data science or math field because I just feel so in love with the work that I do that even if I am incorrect I know that I can learn from the things that I'm doing incorrectly but I absolutely had low confidence early in my career so I just think it's really important that we make that distinction because that low confidence in your skills means that there's something for you to learn it means that you should reach out to other people to learn and get feedback and we're all gonna go through that and I think you know I'd love for the conversation to change a little bit about how we can all help each other gain confidence in areas we're not so confident in and less about whether or not you belong in that room because every single one of you belongs in this room and in that room wherever that is for you so I can talk a little bit about the the area of machine learning and AI is moving really fast and so when you begin your career as an engineer or as a data scientist you're really focused on one area you're focused on one problem and you really know the depths of it but you know as you like where I am right now I cover game theory and auction theory and all these areas and sometimes I just feel like oh my god I gotta read these three textbooks before I know everything there is to know about game theory and then the little voice in my head says no that's not your job the three other people who are experts in that area but there is that you know you do feel sometimes that the information is out of your control like you really don't know the depths of it so it does happen now and then that you know you do feel because the area is just moving really too fast like I you know you do a tutorial I did a tutorial in April I prepared the material in March and there was a conference in March and there were three new papers that the audience asked me about so you know this is how fast the area is moving so you do feel that and you got to just remind yourself that no one is asking you to really have a PhD in every field that you're talking about it's good to read and know as much as possible but I think that's where some of the imposter syndrome starts at least for me that's where it begins with that I like expertise and so I feel like I should have expertise in everything and I think we should remind ourselves that's not how it's supposed to be and I'm sure that is true for everybody in this room we like being experts that's why you're in data so it's a part of what you're articulating is the distinction between expertise and openness and the desire we have to display expertise and how hard it is sometimes just to show openness and vulnerability and so with you one of the things I appreciate about your presentation was like hey here's me here's my family here's our data I'm sharing it with the world it's a TED talk my husband's a TED talk so how did you get comfortable writing that line and being really open about your your life and being vulnerable in that respect yeah I mean and I want to go back to the imposter syndrome too I think being open those words just the the comments that I heard from people that's really what people want to hear like they want to see transparency they want to identify with you in the ways that you struggle I think the the more sort of accoutrements that that come the more people sort of put you in this other category and so I'm always ready to get on stage and be like let me tell you how not other I am let me tell you how you know I was in Walmart the other day right like and so and so I love that about sharing the data because I think it just makes my life relatable and also makes it a possibility to folks who may think like oh I just I could never like you sure can and here's how you can do it I think for me the imposter syndrome it's funny when I was at Spelman I didn't have any imposter syndrome right it was like oh great I do math I love math I could sit in that space and like just taking the mathematics I think it was when I got to Rice and I was the only woman and the only African-American and I was like whoa am I supposed to it's just the right place um and so even though I think that environment was really supportive and hospitable um there were places that I remember my fourth year as a stats PhD student I was going to a stats conference in my area super excited you know and I'm getting there and I'm like oh you know folks who'd authored the books right how geeky is that I'm like can you sign my book I learned you know math stats from you um but yeah I remember like walking up to a table and and you know the person's like oh ma'am are you at the right conference because there's another one down the hall and I was like yeah you know I can read this big banner or and so I mean I'm kidding that um and so like I mean I didn't say yeah I was just like and and so like those are the moments I think where other people's opinion of me you know even though like technically I know I'm supposed to be here and I know you know that little voice is like but you know she said or you know someone says oh can you refill the coffee and I'm like yeah nope not here for that either you know I'm all for coffee don't get me wrong happy to you know um but yeah sometimes the assumption of people in my field is that I don't belong and so you know I have to wrestle with that you know when I'm going on stage and in front of people you know and like kick that voice out and so that's for me that's the imposter syndrome is really owning the space and and believing that I deserve to be here even when I know there are people who think I don't right it's not like a do they do they not like no they said it they were clear and they were like oh okay you're here all right um but yeah for me that's the issue yes maybe we're naming things differently because I definitely 100 relate to what you just said about other people giving you a questionable yeah reaction and I think I mean it's a personality thing I just sort of like stomp in there like yes I am here but but I do think we we encounter those feelings and I think it's like finding good ways of coping with them in a way that empowers you rather than makes you feel like you should leave so you don't curse them out is that what I hear you say sorry you don't you don't cuss them out no oh no I just like give them a look yeah it's it's it's a nonverbal one oh nonverbal okay so I want to make sure speaking of we have a couple of time for a few questions if folks want to ask some questions I've got a microphone right over there if someone right next to it hi so I am also in state service and I'm very new to data science in general and I'm being put in the position to teach it to other people in government and so one question I had was what one piece of advice do you have for those that are new to the field or approaching it from a very different background mine is marine ecology I like to poke fish on how to enter into the field and persist successfully through time so I can answer that build on the strengths that people already have because you probably are already doing a lot of data science in your head I mean even even people who don't aren't scientists you know they do they do the math when they're grocery shopping you know they might be sports fans and follow the stats and so there's a lot of a lot of math and analysis that we do day to day and so what I do when I'm working with government folks is I identify something they've already done and sort of lifted up and I also you know help identify what their pain points are because that's what they're going to be most motivated to fix so I don't come in with an agenda necessarily I might have an agenda but I don't lead with it I lead with what are their pain points and how can we use data to stop that phone from ringing and then then they own it and you mentioned in your talk about gateway ways to understanding data yeah what has worked really well for you absolutely absolutely I think for me really I'm connecting the material to the student right the more that they're connected to or even but not necessarily the student but the person that we're working with the more like you say we can relate it to their bottom line or their interest or something that's going to bring them in and kind of hook them you like what I did there hook that's my advice next question hi my question is around kind of building a confident demeanor when sometimes you're the only woman in the room and the reason I am thinking this is more complex than it would sound is I'm always fluctuating I've had two jobs in the data world and my first job I was told that was overconfident that there was an air about me down to the way I walked and that I was very off-putting and then in my new job I don't know if I over corrected but they tell me we're not sure if you have enough confidence like if you could go into a meeting and really command the room you need to boost your confidence and when after a year when I had a chance to kind of establish myself and tell my manager like my previous journey I didn't want that to be my lead in he couldn't even picture me getting that feedback in my old job so I'm kind of confused sometimes I want to be commanding but it's very delicate I feel like as a woman maybe I'm generalizing I don't think you're generalizing at all I'm curious silly and you have to integrate a lot of different voices in a room yeah it's um there is such a thing as both explicit and implicit bias and giving feedback so I think I have been in that exact same position as what I'll say and for me it was really coming to a point where there's a boundary between how I behave and how I live my values and try to connect with people that I work with versus how other people perceive it and it's important to be able to distinguish the two for your own sanity right to be able to like be you and I honestly the moment for me came when I came back from so I had started from getting very personal I started my job at Spotify already pregnant very secretly and uh because that's what American women do I mean that's a whole other conversation um but I so it was very stressful startup into Spotify but Spotify is a great environment so it wasn't um it was a lot of it was internal right and I didn't realize that until I came back from maternity leave and that additional stress of pretending to be okay all the time like went away and I was like okay I'm just going to be real I'm just going to be like this is what's going on and I'm going to be myself to as much as I can be and sort of try to meet other people halfway but also not rely entirely on what they thought in that particular point though about feedback I think um uh it's about continuing to push a little bit gently sometimes assertively to your current manager about concrete things concrete steps concrete recommendations trainings etc uh and that gives you two things one it gives you a more concrete understanding of what they're thinking about and where they're coming from and it put in the second thing that's really important is it pushes them to think about is this a real thing or is this about something else and it really helps and get to the point where they have to challenge their own conclusions and sometimes it just goes away just having that conversation makes them be like oh yeah that wasn't that big of a deal um but sometimes it can actually help them help you more and help you get get real steps to working toward where they imagine you could be a very specific technique to also add to that which is asking questions as opposed to proposing solutions and you might phrase it very much in the design thinking way how might we or I'm curious if and that lets the other people in the room feel like they're contributing to the conversation and that you can then direct and so it's a more subtle way to inject your expertise so that's a way of being confident but being inclusive to your the first bit of feedback that you got so we've got one minute left so I'm going to do a super fast speed round before we wind things up. I do want to let you know that you should be yourself and the first set of feedback that you got was wrong and the second set is also sounds wrong to me so at some point you know don't assume that people who are giving you feedback know more about you than they than you do and so I'd say I think the first version of you sounded great to me and second version of you sounds very calm so combine the two and you know there you go question some of the feedback that you get to but be yourself in the end you spend more than 40 hours in my case even more than that at work and you can't be playing in a role at work so you've got to be yourself great advice all right so real quickly we're just going to up go up and back least favorite word go best indifference disrupt optimize and most favorite word oh I forgot come back to me team data science is a team sport learning inclusive excellence and mine is belief so we will leave you with those words of hopefully of inspiration to believe in yourselves in an inclusive learning type environment and to go forward and bring all your amazing data science potential to the world and whatever career path you choose so thank you thank you very much that was wonderful thank you Martina you can yeah you that's great well we have two more talks and I'm really really excited about Emily joining us here Emily clasper extends from Coursera and I was looking a little bit at her bio that was not in the book and I found this interview with you some time ago where it was said that when you were a little girl your dad used to uh oh reward you by giving you fish whenever you accomplish something and so I thought I'd give you a fish but I forgot it so I I drew some fish for you thank you on on your on the program and this is my my artwork to you thank you thank you this is my life fantastic well I'm delighted to be here today and the women in data science conference is my favorite day of the year really so thank you all for for staying through this afternoon my name is Emily and I lead the data team at Coursera any Coursera learners in the room oh fantastic more than imposter syndrome I love it so so today I'm going to walk you through three stories of how we're using data science to advance education but before I dive into those I want to go back a few years not to the gold fish that I used to be rewarded with um but to 2013 I was just entering the fifth year of my PhD program and I'd flown out here to the Bay Area to interview with Coursera the company of the time was small but mighty there were about 40 employees only but you could already tell that Coursera had the potential to dramatically increase access to education and and I was invigorated by the interview I couldn't wait to start contributing and so you can imagine my disappointment when the hiring manager called to let me know that they wouldn't be moving forward with my candidacy the explanation was that the company was still really early stage including in the data infrastructure and the tooling and there were questions about whether someone with a background like mine remember my doctorate was in economics I would be able to contribute and the truth is the feedback was completely valid I had actually come on site to interview for a role in the partnerships team and was handed off during the onsite to a group in engineering I had no background in engineering I'd never worked at a tech company before all I knew was that I wanted to contribute to education so this is Monfort in school it's a small rural school in south of Bozeman Montana it's my school and this is Forest Park trailer park where many of my classmates lived a couple days a week I was pulled out of class to be put in what's called the gifted and talented program and I remember going home to my parents and saying but isn't everyone gifted and talented why do they have to pull me out it's really embarrassing and my parents answer was yes everyone is gifted and talented but not everyone has had access to the same opportunity at 18 I left Montana and I went out east for school and I became obsessed with understanding the sources of inequality and more importantly what we could do about them in New York I met this incredible playwright Julia Jordan who was convinced that women have a harder time getting their plays into production and I proceeded to spend a year working on observational and experimental studies to understand whether that was in fact true and if so why in one example Julia and her friends wrote four scripts for me that had never before been seen and I sent them out to hundreds of artistic directors and literary managers around the country varying only the pen name on each script so is it purportedly written by Mary Walker or Michael Walker and asking if they were interested in putting the play on stage and if so why and what I found is that unfortunately there was discrimination in playwriting the exact same script when purportedly written by a woman was less likely to make it onto stage but as important I also found out that the theater community cared they wanted the best plays to be in production and this insight spurred change in graduate school I continued to use data to understand who gets access to opportunity I was shocked to learn that over half of jobs are found through personal networks well referrals are valuable in some ways but they often come at a cost in terms of diversity because people tend to refer people who look like them so I ran a series of field experiments in an online labor market to understand why firms hire referred workers are referred workers really that much more productive on the job or is there some nepotism at play so it's the same fascination with understanding opportunity who has it what we can do to expand it that led me to want to work at Coursera and lucky for me eventually I convinced the hiring manager to change his mind and today I lead the end to end data team across data science state engineering and machine learning working on building a better product through data what excites me most about data science is the opportunity it affords each of us to contribute to the problems we care about most I'm going to share three stories today of how we're using data science to solve problems in education the first is about helping learners stay motivated and on track the second is about helping teachers better support those learners and the third is about ensuring individuals have the skill signals they need so they can be rewarded for what they know in the labor market so start with the first a big problem we see in learning is retention this is true both on campus and online at Coursera only about one in five active enrollees go on to complete the course now not all of the drop off is bad in some cases learners say hey I got what I needed in just the first couple modules but in other cases learners drop off because they lose motivation or because they're stuck on an assignment and these are barriers that we can break down two years ago my team landed a feature called in course help so it's a system of personalized learning interventions that reach out to a learner as she's moving through her experience and support her in staying on track here's an example when a learner first enrolls she's encouraged to get started with compelling statistics like how much more likely she has to complete the course if she starts in the next hour as she's moving through we reinforce her progress like reiterating the value of incremental learning and when she looks stuck we can help her get unblocked like recommending the best review material for her based on the area she's struggling with different learners of course benefit from different messages and so an explorer exploit ml system can help ensure we land the right interventions for the right learners we start with this pool of potential message variants so the getting started nudges are one review materials another there are many versions of these which can be served at hundreds or even thousands of places within courses we run all of those through a message level model that understands on average will this message be net positive for learners and decide which to include in our product from there we take each combination of messages and learners and run it through a learner level model which incorporates features of the individual largely behavioral as well as the message and decides which are most valuable for her with the remaining probability we randomize whether or not learners receive messages so we have a fresh and unbiased source of training data and then for each of the now nearly a hundred million interventions that we've served we see downstream behavior does the learner choose to engage with the message does she report it helpful does it have an impact on her downstream learning outcomes and these learning behaviors feedback to power both this was designed developed and deployed in a summer by an intern named Marianne Sorba who's looking at me with crazy eyes right now but really fantastic work in some cases we can also use machine learning to create the message variants themselves so take the case of wanting to recommend review material for a learner struggling with an exam to start we collect training data from instructors and ask them to tell us what's the best material for a learner struggling with this question now we have about 300,000 questions on the platform instructors don't have time to tag them all they've only tagged about five percent but we can use that as source of truth in a predictive model where features of the model include for example semantic embeddings as well as learner behavioral features so when learners failed this question and went on to review material did they end up being successful in the question later in order to predict for the other 95 percent that haven't been tagged what's the review material we should recommend personalized learning interventions like these are driving double digit increases in the rate at which learners progress and complete content they break down barriers both behavioral and pedagogical through the learning experience but it's not just in education where these methods are relevant in health care less than half of Americans follow doctors orders in taking their prescription medication in personal finance about a third of Americans have saved less than five thousand dollars for retirement and by building personalized intervention systems be it in education or in health or in personal finance and beyond we can start to better support each individual in making the best decisions for her future self fully automated interventions are in many cases sufficient to support learners but once in a while we all need a little human touch does anyone here have children at home good share more than at most conferences this is Tucker he's my eight month old and between work and Tucker I'll admit to not having a lot of time for other things I care deeply about for example much of this talk was written from the mother's room of Coursera and and so I can only imagine what it's like to be an online degree learner the vast majority of whom have a full-time job and have a family and are also layering on top of full-degree program one of the things that can really help is human touch an enrollment counselor that reaches out and ask why you haven't logged in a TA or tutor who provides support on a particular assignment but in order to provide that human touch while still keeping the cost of our degree programs low we need to do everything we can to make that human support really efficient last year we built this feature called the student support dashboard which for all degree learners is predicting what grade they're going to get in their currently active courses and critically includes human readable insights for why at risk learners are at risk the underlying predictions are unique in a few ways first our courses are all very different and so we have to core train core specific models second we have to deal with the cold start problem so a lot of degree courses are being offered for the first time and we have to dynamically identify what is the right training set to use for those new courses but I think most interestingly we need to provide these human readable insights for where the predictions are coming from so let's dig in a bit there to start our feature engineering focuses on student activity features and there are kind of four big buckets which I've included up here we could include features to improve say the accuracy of the model about the course itself or features about the students demographics but these are much less actionable for support staff to use in reach outs from there we need to understand for any given learner if we were to permute her value for example to the median would it meaningfully impact her grade if so we serve up these human readable insights included with the prediction so you could take the case of two different learners both are at risk of failing a course one learner's at risk because she hasn't logged in for 14 days she's very likely to benefit from a reach out from an enrollment counselor but another learner is consistently logging in and just failing the assessment time after time she really needs help from a TA at Coursera these at risk models again coupled with the human interpretable insights allow us to provide that human touch at low cost while still keeping degree programs well priced more generally machine assisted solutions like these can accelerate professionals from radiologists to career counselors in supporting others and using their time efficiently to do the work that only humans can once we get learners successfully through the learning experience we also need to support them in being rewarded for what they know on the job there's a lot of folks out there who have their required skills for open roles we've heard from a ton of people in data science who don't on paper look like they would be data scientists and folks who might not look from a traditional resume based on where they went to school or what they majored in or the past jobs they have often have the skills required to do the job and this is becoming increasingly true as we're moving to a world where people are learning throughout their life so the old model of learn do retire only my credentials from the beginning mattered is no longer the world we live in right we saw this room so many of us are investing in learning throughout life and we need to be able to signal that to the labor market relying exclusively on traditional resume signals doesn't just make the labor market inefficient it actually makes the labor market unfair and the reason is that people who have access to traditional credentials generally are people who are of higher socioeconomic status so the world economic forum recently released this report they're basically calling for skills as the new currency in the labor market and included in the report is Coursera's skill scoring offering which I want to touch on briefly skill scoring is aimed at creating more clear signals for individuals to understand their skills relative to their target career and for companies to understand talent it's an application of our broader skills graph which is a data asset we've built out at Coursera over the past few years at a high level what we're doing with skills graph is we're taking a robust library of skills and we're connecting them to each other to the content that teaches them to the careers that require them and to the learners who have or want to have them skill scoring starts with understanding what skills are taught in each unit of content so instructors and learners as they're moving through content are tagging skills to courses many of you may have seen a pop-up that said what skill are you learning thank you for answering of course learners and instructors can't tag everything but we can use their tags as a source of truth with natural language processing based features to predict for all other content what skills are being taught in any given unit of content from there we can measure what skills each learner has based on her performance on all of the assessments she's attempted on the platform but for a given skill say statistical programming how can we get from tens of millions of attempts across hundreds of courses and millions of learners to a reliable estimate of each learner's skill score so to start here are four desired properties of our solution first we need the algorithm to produce stable and reliable estimates even in the presence of time varying skills we're learning because we want to be developing our skills the expectation is the skill is changing over time the algorithm needs to support that time varying component second since our assessments are spread across hundreds of courses we need to account for selection effects where in particular higher skilled learners are more likely to attempt harder assignments and we don't want to penalize them for doing so third we need updates to learner skill profile to be explainable so as your skill score evolves you should understand what's happening under the hood and fourth we need the updates to be computationally feasible across millions of learners and thousands of courses including in the online context so when you submit an assignment you should be able to real time see the evolution of your skill score any chess players in the room oh I get the the semi-hands good I knew we have at least a few so for skill scoring we built out an adaptation of elo and related rating systems these are often used in chess also for rankings in team sports I could have asked any basketball fans in the room we treat each learner an assessment item as players in a tournament so every time a learner attempts an assessment it's considered a match and this is a summary of of how the skill updates work so mu is my initial score and mu primed is my updated score correspondingly sigma and sigma primed are the baseline and updated levels of certainty represented in in standard deviations and then the values with subscript o are for opponent s is the outcome of a match for a learner if I pass the assessment I win the match so s equals one if I fail the assessment I lose a match so s equals zero and you can see the explainability and computational simplicity of the updates so if I come up against an assessment that's harder than my current skill level and I pass it then my level increases and very simply that change in my level is just a function of the prior on my ability the prior on the difficulty of the assessment and the level of certainty in each learners use skill scores to among other things understand how their skill profile compares to folks in their target role so for example of data scientists on Coursera what skills do they have and therefore what do I need and what learning can close the gap in parallel companies are starting to use skill scores to identify talent well positioned for new opportunities for example of back end engineers who has the baseline math to be able to transition to a machine learning role so as any labor economists will tell you we pursue people pursue education for two reasons the first is to develop human capital or skills and the first two case studies personalized intervention system and interpretable student at risk models are really designed to support that human capital development but the second reason that we pursue education is to have a valid signal in the labor market so we can be in the job that's best positioned for what we know and realize our full potential and that's what skill score is aimed at accomplishing stepping back the world is changing at a faster and faster clip driven primarily by technology and globalization I crib this from Thomas Friedman in his most recent book thank you for being late he talks about how the rate of technological change is exceeding the rate at which humans are able to adapt and this is creating serious dislocation for many people we need to he says bend that curve of human adaptability we need to learn at a faster and faster rate and not just you and me but billions of people around the world the good news is that data and technology can also be part of the solution and that's why I feel so lucky to be both a data scientist and a course area all too often there can be a disconnect between the data science work and the impact and so I hope that the story shared today not just about Coursera but throughout the women in data science movement can remind us all of the impact we can have through data science because it's incredible the products and services that we can build to make the world a better and more equitable place thank you thank you so much thank you so much Emily you were also talking faster and faster so as the curve is is changing I love it when people listen to me it doesn't happen all that often so just a very quick comment I know a lot of you have been a little cold this afternoon there's just another example I think of the the climate in buildings not being said for the the female body but for the male body we asked them to change it and the hope that you're gradually warming up but soon we'll have some wine and beer and other things and then you will really warm up but hey we have one more talk and I'm so happy that we are joined by Bean Kim who works at Google Brain we had an astrophysicist before Bean you studied mechanical engineering and I think aeronautics in Seoul and then you came over here you did your PhD at the east coast right I shouldn't I'm not going to name the university it's just a small university on the east coast and you started at Google I think five years ago three years ago only it's amazing how fast you've grown so thanks so much for being here and for being the last speaker of the day I think she deserved an extra applause because that's tough thanks so much thank you thanks hi I love speaking to a room full of talented woman there's just something about it you know we can talk about research which I'm gonna do but we can also talk about imposter syndrome which I very much resonate so I hope that you've been enjoying today networking and appropriately sanitizing your hands whenever necessary so today I'm going to talk about a little bit of research that I do at Google interpretability for everyone the research interpretability last couple of years has been amazing a lot more papers were published unlike when I was doing my PhD a lot of workshops but I still see this as a drop in the ocean because interpretability has all these complexities that all machine learning folks already have plus complexity of humans it's so complex that we have a whole field just dedicated to humans it's called psychology and first that's why people keep being interested in this topic but that's also why we have to think about where we're going there's so many directions we can go but we as a community have to row in the same direction so that we can go somewhere in a couple of years otherwise we'll be rowing in all sorts of directions and rotating in circles all day so today I'm going to talk about first where are we going and second taking a critical look at the toolkits that we already have that would help us point where we should go next and then also just an idea that might be slightly better and then we're going to come back to that idea to criticize that very idea to inform us help us think about how we can do even better and then I'll finish with what we should be careful in general in our journey so first where are we going unlike supervised machine learning which we have accuracy which is a flawed metric but generally we roughly agree okay that's roughly where we should head to but interpretability is a little different because we have humans involved humans you can't really define anything mathematical about humans well at least not yet so we don't have that clear goal and that requires us to ask a harder question a value judgment what are you trying to achieve old smelly and what are your values now your goal and your values are yours but I'd like to share my goal my goal is to help people use machine learning more responsibly and more effectively and that could mean a lot of things but one it means to me that we can align our values in the model and our knowledge can be reflected when we want it and I particularly care that this can be available for everyone for two reasons one for high stake domains like medical we have doctors who may have domain expert knowledge that is critical to make important decisions but they may not know machine learning but those are the cases where interpretability becomes most useful and important and second machine learning is a powerful tool and it shouldn't be the case that you had to you're lucky enough to be educated in computer science or math to be able to leverage that powerful tool I think everyone deserves to leverage that powerful tool and that requires being able to understand how it's doing and being able to build on top of that it's important to think about what are not our goals it is not about making all models interpretable so models it's it's a overkill you don't need interpretability it's also not about understanding everything about the model because the overall the ultimate goal is remember reflecting values and reflecting your knowledge and that might just be you know just be the saving some more patience or debugging this one bug and that does not always require understanding everything about the model not about against developing a highly complex models and perhaps most importantly it's not just about gaining trust in fact if the model does not deserve to be trusted the interpretability method should reveal that for example this beautiful paper came out last year that showed that deep learning model that predicts heap fracture pretty well wasn't actually looking at the medical image at all it was looking at x-ray or whatever the medical machine that was taking the medical picture and the model number so models were older than others and perhaps it reflected some where the hospital was and economic status of that area and interpretability method did the right thing it revealed that this model isn't reliable shouldn't be trusted so next up where are we what do we have now so today there are lots of different interpretability methods but i'm going to focus on one specific type and it's called post-training interpretability method what does that mean that just means that somebody gave the model to you and you can change the model and you're going to do the best in interpreting that model so for example you might have a model that takes a picture and predicts what's in the picture in this case a bird and a goal of interpretability method is to explain why was this a bird the one of the most popular method in interpretability is called saliency map how many people have heard of saliency maps all right no worries it is using a first order derivative to test if i change this pixel a little bit how much would the probability of bird change so intuitively if we change is a lot it's important pixel if it's not it's if it doesn't it's not an important pixel that's the basic idea it's very popular so the promise of this method is that this is the evidence of prediction these are the pixels why it was predicted as a bird then we can ask a sanity check question okay well you said it was evidence of prediction so if the prediction changes then the explanation should change in fact we can make this really extreme and make the prediction completely random a garbage network and in that case the explanation should really change so we test that so here are here's the network beautifully trained high performance we got a bunch of images and we got a bunch of explanation the saliency map which would look like this and then we copied that network and we randomized it from the last layer the prediction layer all the way to the bottom layer so this network is garbage it does not predict it predicts randomly okay so you would think that explanation should change it does it and you might think that well come on these are technically different pictures they did change a little bit but remember the final consumer of interpretability method is a human and as a human I don't think I will conclude different things looking at these two pictures the belly of the bird still important the cheek part or the head part of the bird looks still important so we did this for many methods each row is different method one of which is my own work and we cascading manner randomized each layer so the last column you're looking at completely random network and as you can see some method doesn't even change so in this paper we also calculated quantitatively rank correlation and other things to show that in numbers too this is true which is quite shocking so we were very confused and shocked and in fact community was so how did this happen confirmation bias this is such a strong thing that it's just weird being human we were given a bird picture we expected to see a bird we saw a bird and we liked it for years and myself included this is something that we as a human just have and we have to take that into account when we are designing interpretability method folks in the similar time same time reach the similar conclusions some folks mathematically prove that some of these method just looks at the picture and not at all the prediction but some work show that when shown these explanations humans did better in a final task so maybe there is something we just don't know what it is yet we need to do more study what that is to get more of that so how can we do better so we're still thinking about the same problem which is a subset of interpretability method and now I took a cash machine picture and again the goal is to explain why was this a cash machine let's get that saliency map that we just talked about to help us think think about what do we really want to ask when we look at the explanation so I squint at this picture and I see this human in front of the cash machine a little highlighted so I start thinking oh maybe it makes sense cash machine maybe you often have a human in front of it but for all the reason the wheel behind the human is also highlighted which is a little concerning then I start wondering well did these concepts matter or am I seeing just illusion if they do matter then which one mattered more and did it matter for all pictures or just this one picture the problem is that pixel based method you can't quite express this concept across many pictures like humans and you would have been fine if you had this concept as a part of an input feature then you get the weight and you're done but you didn't I just made it up after the model is already been trained you can't change the model so we developed this method called TCAV testing with concept activation vectors which is the goal to give a quantitative quantitative measurement of a concept or concepts that you came up with after training if and only if it mattered it will be obvious in why why that is the case so I'll explain to you how this works this is pretty relatively simple method and I'm going to use a zebra you have a classifier that predicts a zebra and you're wondering whether it's a stripeness pattern mattered for that prediction so first and foremost you might ask okay well concepts sounds good but what is it like how do you even express what that is well we do the simplest possible thing we represent that concept as a vector and this has been done before how do we do that well you get some examples of that concept that you're interested in investigating in this case some stripeness pictures and you get some random images and you have a network that you are you want to investigate then what you do is you simply get a linear classifier that separates random activations from concept activations and get a vector that is orthogonal to the decision boundary what is what is that vector well it's just a vector in the embedding space that points from random stuff to your concept that's all there is so you have this vector that represents the concept next is to get that quantity or the t cap score how important was this concept also pretty simple we are going to take what's called directional derivative this is intuitively speaking if I take this zebra picture and make it more like the concept how much would the probability of zebra change if it changes a lot it's an important concept if it doesn't it's not an important concept it's basically it then we do this for many many zebra pictures and calculate the ratio of zebra pictures that gave you positive directional derivative which just means that having the concept increased the probability of zebra so you had 100 zebra pictures you calculate derivatives and 80 of them came out positive then your t cap score is 0.8 so that's not all super quickly we have a way to quantitatively validate that this is not some spurious cap because we all know that in high embedding space in neural network funky things happen like adversarial examples so here's one way we propose to confirm that the basic idea is that you get many many concept vectors using many many different random sets of examples from which you get t cap scores and pretend that that's a sample there are samples from a particular distribution and do statistical testing on that to come to distinguish that from t cap scores from random caps where random caps are just random versus random pictures so it's just confirming that this is at least better than random concept on this particular class zebra so some results i have a lot of my favorites but i'll just talk about this medical example diberidinopathy is a medical condition that is treatable but slight threatening condition at brain we have a model that can accurately predict dr but our question is well would doctors do the same thing does the concept doctors use same as the model used so we went to a doctor and we asked her which concepts she expects to see at particular level of dr dr level four that's most severe level of dr and what are the concepts that she does not expect to see for that level and similar for lower level of dr then we asked the model with t calve what do you think it turns out that when the model accuracy was high in the dr level four model was using concepts that doctors would have used so that's cool but when model accuracy was mediocre the model was using something that doctors did not expect which is the hma it's a type of hemorrhage in your blood vessel and they went back and realized that the dr level one and two the doctors were labeling all over the place and in fact this hma appeared a lot in dr level two which was often confused by the model with dr level one so they went back and revisited the labeling process i've been lucky to observe a lot of passionate responses from academia on this work uh kerry kai worked uh had this kai conference paper they used calves to concept activation vectors to help doctors sort through medical images so this time doctors are looking at images not as in terms of pixels but in a language that they are familiar with medical concepts i met eric who used t calve for storm prediction mara used this for breast cancer and a lot of other things mri from king's college and so on and i was also really lucky to see that sundar talked about this work at google i o 2019 my dad was very happy i think he stood up all night to just watch this and by the way this work was also one unesco netx plus plow a word which is given to 10 digital innovations and i code uh that will potential profound and lasting impact so let's come back to what we just talked about how can we do better plenty of mutations of t calve that i think we can this is such a simple uh framework that we can always extend and people have one concepts has to be expressable using examples so something that overlaps a lot in your images or something that cannot be expressed using examples you can't really use this you need labels of concepts we propose an idea this year i'll ask you about how automatically do this for images but we don't know how to do it for language for example we're making a causal all this all these are that i talked about so far is correlation based it's not causal so we propose an idea but you still need to train a generated model which correctly reflects the data generated process which is hard and causal is always harder but there's a lot more work that we need to do finally i want to talk about what we should be careful in our journey proper evaluations i cannot emphasize this more so remember i talked about the complexity because we are so complex it is very hard to tease out what we like versus what's true and it requires a lot of thinking and you can perhaps achieve that by doing some sort of sanity check that i did or crafting a data set where you know the ground truth what is important and confirm that doesn't it can be toy data set it's just sanity check sanity check ourselves testing with humans remember does nothing we're doing is useful if it's not consumed by humans so we have to check with humans that it works remember we are beautifully human and biased and irrational in fact uh if you're if you're into reading i highly recommend Danny Kahneman's book on undoing project where he beautifully describes all different biases we have example by example like statistical quizzes that you can attempt to solve and i failed a lot of them too because you are just biased even in them given like facts and numbers where you can calculate them HCI i think we have a couple folks had talked about this already i cannot also emphasize enough importance of HCI we have to think about workflow it's not just about math and algorithm we have to think about interfaces we have to think about how humans respond human studies human factors psychology cognitive science these things all have to come together to make this work and let's keep checking that we are going to right direction as we go so that in many years hopefully not too long we can sit somewhere and say how we've gone come here because we all run to the same direction thank you so much thank you so much thank you what a great way to end in fact we should keep that last slide up because you had so many beautiful conclusions and look at Liza you're amazing you're drawing fast i hope you really enjoyed the drawings made by Liza donnelly here yeah there's definitely some that i will frame put on instagram it's wonderful uh so you know before we do the actual closing remarks here i just want to make sure that to thank people there have been so many folks making this conference possible i have the incredible privilege to be on stage here most of the day but i did not organize this conference certainly not by myself there's so many people involved karen and judy you saw earlier this morning they are incredible co-directors so big round of applause to them now and then all these people here on the side behind these black curtains there's actually people behind there and they've done amazing work from the live stream to the slides to the the sound checks everything they're ready for us and and running almost flawlessly today and any mistakes that there were were ours so thank you so much for that right and then we have the volunteers and the logistics folks and the people running helping us with the ambassador program and getting us speakers and etc etc is your book at the end of the book there's a whole list of of everybody that really helped here so it's been an incredible day and it's been an incredible journey we are always taking a short break after our own wits conference but we very quickly start with the next one you know we take maybe a couple of weeks and then we start thinking about next year who will we invite how are we going to improve upon the ambassador program we get people enthusiastically sending us emails saying we want to be an ambassador next year and how do we do that so for us you know we're constantly thinking about it and i was just thinking about this last year leading up to the conference now and the big changes we've seen in just five years not just in the growth of this conference but also in the diversity of thought we've seen at this conference also here today the spread of data science in so many different application areas the diversity of backgrounds in people now honestly 10 years ago when I first really started working on data science and building programs around data science most of the people in that field and most of the people who we heard were computer scientists mathematicians or statisticians that's not the case and certainly not anymore there are people with all sort of different backgrounds and I know there are many different people in this audience today with different backgrounds and you no longer ask yourself whether or not you belong you may still feel a little bit like an imposter because we a lot of us do I do as well but you don't ask yourself anymore do I have a place in this field and it was clear to me today that there is a place for everybody there's a place for humanists there's a place for social scientists there's a place for for lawyers there's a place for economists there's a place for astrophysicists there's a place for electrical engineers and mathematicians and statisticians and and everybody earth scientists you name it so that's a wonderful to see and we're really seeing this also at which conferences around the world these which conference are becoming more and more diverse in all sort of different ways we heard a lot today about bias and and integrity and fairness and equity and transparency and and I'm so glad we had the panel this morning to set this off as well we heard that it is not just about efficiency about efficiency and scalability we heard that it is extremely important to have a shared understanding of value systems we heard it's really important for us to understand the difference between what we like to see and what is really the truth and that we always have to stay critical that we can be helped by others to stay truthful and and that it is so important for that reason to have interdisciplinary teams in all sort of different ways I've learned so much today I want to thank all the speakers again for today all the insights that they've given me lots of food for thought so a big round of applause to all the speakers again and to you and I just wanted to tell you also this is just the beginning you know we've had a few WITS conferences around the world already we'll have many more to come this week alone almost every day there are WITS conferences elsewhere in the world and a lot of those people are sharing talks online so please keep an eye out for it join us on LinkedIn join us on Facebook and other areas to find references to those to those other WITS conferences we'll keep going until May I think right I think until May we have conferences around the world the southern hemisphere is going to join us probably a little bit later this month there's still a lot of them are still on vacation at the universities they come in later and and we have so many others coming up in Africa in the Middle East in South America you name it now I was also thinking about changes and you know somebody asked me when do you know that things are really moving and that women are really finding their place and I thought when there is a television show on female data scientists maybe or maybe when there are books with heroines you know data scientists and you know to my big surprise a month ago I got an email from a Dutch author called Helene Kist and we do now have a book about female data scientists she wrote a book stay mad sweetheart you can get this on on amazon.com but let me offer you this if you'd like to read it send me an email I will send you a copy it's an enjoyable read it's a mystery and the three main people in this book are female data scientists I didn't think I would see that day five years ago but here we are we've made it now they just need to make a movie of this right so keep an eye out for that so we are shortly going to be breaking for a reception keep in mind that we have our two wonderful artists there as soon as we leave the room here grab a drink and make your way to the end of the living room where the grand piano is because Patricia is going to give you a demonstration of her fantastic work on the interface of music and AI and you will see the work the visual arts by Andrea there as well and I think you'll be blown away by that and in the meantime we are soon gearing up for WITS 2021 so here's a placeholder put it in your book March 8th and thank you thank you for joining us today