 Raj's talked about data for good and I'll talk about that a little bit Making it a better world. So what what does it mean making the world better via predictive analytics? And so I think that's a little bit more detail than what Raj's talking about. So there's a couple of key things here So predictive analytics and what does it mean to have a better world? I want to drill into that there we go, so If we think about data Raj talked about a lot of different types of data So I'm going to start with a very simple example I'm going to start with a chocolate bar. So if we sell chocolate bars Data which Raj mentioned probably at least 15 times raw data is just the sale of a chocolate bar So I work in a store and I sold the chocolate bar at 4 o'clock this afternoon And I saw also so the chocolate bar at 10 o'clock this morning. That's data And we can think about it as just going up a little level into what is information? And so a simple example of that would be we sold 82 chocolate bars last month So that's some information. You can do some things with that but not a lot Going up a level you can think about knowledge. So knowledge could be things like trend analysis So you can take a look at over the last year We've been increasing our sales of chocolate bars by 20% a month So that's some knowledge But in data science We don't want knowledge. We want to have some predicted insights some actionable insight and So as we think about what is actionable insight in the chocolate bar example It would be that we understand and have looked at patterns of how people buy chocolate bars and we see Baskets when people buy chocolate bars. They also buy peanut butter So maybe they'd be a good product that we should proactively create which combines peanut butter and chocolate bars and then we could Proactively think about doing that based on the data we have So obviously in that example somebody's already done that because there's Reese's peanut butter cups that have kind of have that concept And I talked about that for a couple reasons So what we do in data science oftentimes you can do with other techniques it just might take a long time lots of resources or other ways and it might take a lot of trial and error so in data science we try to Simplify the process make the process much more Promising in terms of what are the actionable useful insights, but that's how we really think about it So what actions can we suggest that would be good to change the future as opposed to looking back in time and thinking about how that works so If we think about data science one of the things to highlight Raj talked about all the good things that could be done which maybe not surprisingly has been borne out in Industry hiring which is to say data science is one of the top fields from a hiring perspective so you can look at this is glass door if you can't kind of look at the Slide in detail pretty much every year. It's like the number one job number one number two job It's stated in a lot of different ways in terms of growth and everything else, but the key is Our skills data science skills are needed in lots of different scenarios So now I want to go through some examples in the real world. Can we go to the next slide? That's much faster So here's I'm going to go through kind of a couple different examples and give you a little bit of feel for what they are So the first one I want to talk about is outbreak analytics So outbreak analytics is what will be the discussion tomorrow morning. It's where data scientists work to try to understand Infectious disease and how they spread and kind of the patterns of how they spread. So, you know, I guess unfortunately at this point something like this is maybe more Common than people would have thought about two years ago. So this is about COVID and different trends But also about different predictions about the future and so I don't want to talk about too much about this because tomorrow morning You're going to hear a lot about it We go to the next slide There we go. So the next one I want to talk about is climate analytics and What I mean by climate analytics everybody kind of knows about Climate change and Everybody kind of can understand and think about the ramifications of that So how does data science fit with this this this field and so You know one example that we can kind of see on this slide and people have heard and there was you know all the major World leaders were recently together to talk about this And so you kind of hear a lot about the Paris Accord 1.5 degrees versus 2.5 degrees So that's data science being used to understand the impact of climate change on the world and What we're trying to understand is predict Based on what we currently know in terms of the environment and what we do to the environment What will that do in terms of temperature change? But more importantly than just temperature change then we predict what it will do to different aspects of the world, right? So the waters rise and so you build the whole predictive model about what will happen in the future So that's the key around climate change is being able to predict the future not just looking at the past So if it was just data analysis We would say that we've taken a look at the temperature over the last 50 years and we can see it's increased a little bit But that's not that's not actionable insight And so what we want to do is we want to have a situation where we can have actionable insight Which is if it rises by another 1.5 degrees, here's the impact in the world if it rises by 2.5 degrees Here's the impact of the world and things like that. So that's what climate analytics is Go to the next slide so health analytics Lots of different scenarios of how this can be used roge talked about a couple So for example, one of the things he talked about is um stability And being able to understand and predict Who might be unstable? Actually even on your iphone There's software that's being kind of created now and used To help understand people at risk for falling So again, think about that. It's predicting the future to try to make the future better than it would be without that prediction So if we can predict that you're highly likely to fall Then there's things that we can do as a society or as your doctor To give you some stabilization around that. So that's one example, but there's lots of other examples so This is kind of what I just talked about here Right predicting patients for special services You can also take a look at historically. We've given lots of different treatments And we can understand which treatments work better in different scenarios Or which combination of treatments work better So that's a different example of trying to look at data Whether it be patient data the analytics behind that whether it be clinical data and combining all that to predict the future Another one that I'll probably discuss a little bit later could be for example Predicting if you have cancer or not. So we'll kind of give into that one in a couple of slides So basically this whole field is around again, it's not just analyzing the data But it's making a prediction To understand based on all this data we have How might that be influenced and useful in the future? Go to the next slide So this is what roge talked about data for good You can think about everything I've just talked about up until now and even a few slides after this There's all about data for good So there's a whole Field that kind of talks about data for good But I like to focus on the different Actual uses of data rather than this umbrella for data for good. So if we go to the next slide You can see that For public administration different different roles within government So the slide here talks about anti-corruption But there's lots of other places in government roge talked about trying to predict Uh development zones. So, you know houses for example where if they go down and value some more It doesn't just impact that one house, but it impacts an entire neighborhood That's an example of government analytics And how that can be used if you can make that prediction you can try to proactively Stop that from happening You can think about that. This whole concept could be linked to cyber crime and trying to predict Where crimes might be occurring from that perspective Go to the next slide So sports analytics So if you know sports if you follow sports Pretty much any sport that you either play or certainly if you watch at a professional level has been impacted by analytics And this is a field that's growing Month by month year to year both from a financial perspective and an impact on the actual sport And that's true pretty pervasively, but there's still areas where it's still very Mason and up and coming so for example, how do you price the ticket of a sporting event? And what's the best way to do that? And so that's not really data for public good It's data for good for certain people And so when we think about data for good, we're kind of how to make the world a better place You know, not all projects will make everybody better So when you're thinking about it more from a business context or a sports context It's how do I get better than somebody else? Perhaps is one way to think about it We go to the next slide So business analytics. This is definitely driving most of the jobs And quite honestly even most of the press around how it's being used So there's lots of different examples. I gave an example earlier about cookies cookies and Chocolate bars, but you can also talk about just analyzing customer behavior So taking a look at Different sales trends and what people are buying and what they're not buying You can think about trying to predict fraud. So I used to work at a bank So banks use credit cards or service credit cards There's lots of prediction about is this a fraudulent transaction or is it not a transaction? That's fraudulent We can try to predict future customer services future customer products Raj mentioned inventory management. So it's not The most interesting part of data science, but it's certainly used a lot Go to the next slide. You'll see something that's pretty common Hello there So this is something you've already seen and it's just Same thing is Netflix Right. So when you're watching Netflix, then you're just lazily there and you're done a show Maybe you haven't done it, but I've certainly just been on my couch and didn't want to do anything so Um, this is using data science to try to understand what we're going to predict that you would like to use next So that's really the goal and the focus of this again. So this isn't to make the world a better place But it could be to make your world a better place because it's suggesting either something to read You can see that I'm into data science That is suggestions Or something to watch so again, it's highly structured in terms of what that is So if we go to the next slide, I think I kind of want to think about things from a different perspective so college alumni And again, this is you can think about this is how do we Take different groups of people and create different persona So I thought I would kind of use this as an example for alumni analytics So at at a very simple level, you can think of three levels of engagement And this is a characterization of a group of people and so we would cluster people into You know, maybe alumni that Are just a little bit engaged Or alumni that are pretty much engaged Or really strongly engaged and obviously there's lots of different dimensions to that And we could think about that across multiple different ways But that's really a way to think about Using analytics to think about how we should be treating people whether it's about alumni Whether it's about customers At a hotel It's the same idea. So a customer at a hotel. We want to think about it in a certain way How do we treat people that come here frequently versus people that don't come here frequently? How do we keep business travelers versus people that are on vacation? Next slide So I could keep going But I thought how do I end all this because I didn't really know which topic to end And I thought everybody loves dog. Well, not everybody. I love dogs. So you can even have doggie analytics so there really is a discussion about How to think about analytics when you train dogs, right? So what are they learning? What are they not learning? How do you reinforce different behaviors? And make predictions about what training methods would be better for different dogs So pretty much anywhere where you can imagine That we can collect some information And then make a prediction about the future is where data science can be used And the key is that we're going to make a prediction And we're going to make a prediction that will be helpful and useful not just interesting So that in a nutshell is what data science is We take a look at the next slide. We're going to start to drill into deep learning. So everything I talked about to this point was different industries or uses Deep learning is different because it can be used across many different domains or areas But it's a pretty pervasive way to think about what is data science So deep learning can be used across self-driving cars. So most people might now have heard about lots of different versions of self-driving cars that aren't quite totally self-driving yet Even though Tesla has something that pretty pretty much calls self-driving cars But it's definitely getting closer And so you can think about what's happening here is we're making predictions that this is a person and this is a car and this self-driving car Is trying to Make a prediction about where best to go is the best way to think about deep learning But it's not just for driving cars Deep learning can also be used in a number of other ways So for example Hey, google or alexa is using deep learning to understand what you're talking about Um and cancer detection. So I'll just kind of give that one another one Which is it can learn And I mean by it the data science machine learning algorithm can look at lots of different examples of blotches on skin And know which ones are Cancerous and which ones are not And so today you can use an algorithm that looks at your skin And can make predictions that are as good or better than doctors as it looks at that And the next slide So this is a little slide that talks about how effective Deep learning is For understanding speech So you can look at from you know roughly 2001 to roughly 2013 There really wasn't a huge change People could be much better than computers and then deep learning was introduced over like a two or three year period and Very quickly became as good as humans And now data science machine learning algorithms actually are slightly better than humans in understanding it So alexa or hey google it's really good And it's better than if you just listen yourself next slide so when we talk about Like hey google or alexa the natural language processing and text mining I wanted to give you a little simple example of how sometimes it can be challenging If you have a little feel for what like data science challenges are so here's a little sentence I saw the man on a hill with a telescope Pretty simple sentence Not that complicated. Can we go to the next slide? So here's one interpretation of what it is I saw the man The man was on a hill And I was using a telescope to see the man So that's the first picture that's outlined If you go to the next one The exact same words of just different punctuation You can see that I saw the man The man was on the hill And the hill had a telescope Which is kind of the other picture that was highlighted so The same words mean two different things Based on the context and punctuation And so that's some of the challenges as people work on different aspects of data science Not from a pure predictive numbers perspective, but from Text mining and natural language processing perspective, which is what a lot of people learn You guys gives you a feel for some of the challenges that need to be overcome We go to the next slide. Let's go on So What are the key components of doing data science? There's a pretty common Venn diagram to try to explain it There's probably 20 other Venn diagrams that also try to explain it Uh, so there's not one consistent definition of what is data science and what are the skills you need But I thought since this is probably the most common one I would start from here So part of data science is you need some computer science skills such as Programming in our python or something like that understanding how algorithms work You need some math and statistics So when you're doing data science, you have to understand probability because your model is not ever going to be 100 correct So you need to understand what does it mean for the algorithm to be likely to be correct? What does that mean? And then you need some domain knowledge Because if you just have the computer science knowledge and the math and stats knowledge And you can't know how to apply it and you Probably won't be creating actionable insight You just be creating insight that either is obvious or not useful And so you need this domain knowledge and so kind of at the core in the middle is what data science is And at the iSchool as roge talked about earlier You kind of think about it as human centered data science. So we're not really focused on the extreme Of building the next best machine learning algorithm that would be more of a computer science department We're not focused on deep math and statistical skills We're definitely focused on how to apply data science In many many different human centered contexts all the ones that rod was talking about and what I talked about Next slide So when we think about a data science project we can think about a life cycle of Kind of how we work on a data science project And I thought this would be a good way to help you have again just a little bit of an understanding of what data science is So we start almost always with data understanding and business understanding So we need to understand the business context of all this data we've collected What is the problem we're trying to solve? What are we working on? What kind of prediction would be useful? What kind of prediction wouldn't be useful? What data is available? So that's about data understanding. What data is available? How valid is the data? How clean is the data? How much data do we have? What type of attributes do we have? How can we think about different ways of collecting data from maybe different sources and combine them together? And then combining them together. That's the data preparation phase So the data preparation is what I like to call data munging So it's kind of in the end a lot of like maybe uh programming to get data into a common format And then we do modeling. So this is actually building a predictive model So everything that I and Raj had talked about really maybe if you're focused more on the modeling part of the data science But there's definitely more and then once you build a model, you have to understand how good it is That's what evaluation is And then once you evaluated it, you still haven't gotten anything useful. You have to deploy it And deploying your results might be very different based on the context So if you're netflix Deploying a model means letting people get a prediction about what to watch next If you're a marketing organization, it might be how to do a certain campaign So you're only going to have it used once And if you're a researcher deploying it might be a publication in the paper, for example So these are the different ways you can think about it But all these steps are done to do a data science project And this is done iteratively So it's not like we understand The business we understand the data then we prep the data and we build a model we evaluate the employ and we're done Data science almost never works that way You think about this is like We understand a little bit of the business you understand a little bit of the data We have a model that's not so bad And we maybe we deploy it, but maybe we don't but what do we learn from that and then we do it all again Then we do it all again Then you do it all again And again and again And eventually you start to learn collectively And the model starts to learn about what's useful And maybe halfway through you go like oh, there's this other data that we never thought would be useful But if we pull that in maybe that can improve the model So that's the spirit of how data science really kind of works Next slide Oh, yeah, so this kind of talks about it from a skills perspective And you can kind of keep hitting next this is going to kind of keep going So we start we start by learning the application domain And then we communicate with users So these are all kind of the different skills you can think about through that life cycle And then we kind of start to see the big picture which is kind of understand How and what predictions might be interesting If we have to think about how we're representing the data we have to transform the data So you can think about these there's different skills Quality control is something really interesting. How do you know your model is correct? How do you know your prediction? Isn't just the result of some error somebody typed when they were writing some algorithms sometimes How do you know that your data doesn't have some bad structure to it bad data? Maybe actually just it's humans. So maybe someone typed something in wrong. How do you know it's correct? You don't really know in the end you have to present it in an intuitive way So Heather's going to use that algorithm or if it's going to use that prediction So you need to present Your prediction in a way that's intuitive for somebody that doesn't know data science. And that's what visualization and presentation is about and in fact in 20 minutes Professor Hemsley who's in the back of the room will be talking a lot more in visualization And presentation and things like that. So we'll we'll kind of skip that. So if we go to the next slide So I wanted to just to talk for just maybe two minutes on kind of ethics and what does it mean? Because we talked about human and You know data science and human centered data science, but I wanted to kind of give you a just a little touch of a feel about what it is tomorrow I don't know right before lunch or right after lunch. I forgot which I will talk about some more about ethics So if we go to the next slide, we'll kind of take a look. So first of all There's lots of examples of models gone wrong I won't highlight these All I'll say is that these companies did not set out to get public, you know headlines and big newspapers That were not very uh complimentary But things can go wrong And if you're not careful about how you're using data science, it's easy to fall into this trap. These are obviously Intelligent companies with lots of very smart data sciences So if we go to the next page These risks are pretty well understood by senior leaders in industry So I've highlighted here if you're looking they're kind of grayed out in the background all the issues that are related to kind of ethics And situational challenges in using machine learning and data science and you can see out of all the different concerns Of industry leaders about half of them are related around ethics This is from Deloitte in case you're curious Next slide So I wanted just to end this by as we're thinking about Ethics, I wanted to give you a little bit of you of why it's hard So fairness is a great concept Every data scientist that I know Has integrity doesn't mean everyone does but all the ones I do have it has integrity So they want to do what's right. They want to do what's fair. They want to build a model that's fair Sounds simple, right? Let's build a fair model So if we go to the next slide, we'll see that Maybe it's not so simple because Let's take a kind of the example of Giving out a loan right so I used to work at a bank So I know all about giving out loans If you want to give out loans the question is what's fair What's the right way to decide who gets a loan? So one scenario helpful here Is ensure that we make loans available At the same rate for two different subgroups So people that have blue hats and people that have white hats So for whatever reason you have a blue hat or a white hat and we're going to be fair Because we're going to say, you know, the same percentage of blue hats and white hats gets loans That's one definition of fair And a lot of people would argue that's the right way to do it Another definition is We're just going to focus on each person individually And make sure that we're going to kind of Maximize who gets a loan based on their ability to pay back And we're going to take into account what color hat there is because that's part of being able to pay back the loan And so that's fair is if you can get the loan and repay it We're going to give you the loan if you can't and we're not So that's the definition of fair Another definition of fair is we're going to maximize and look at who can repay the loan But we're going to ignore what color your hat is because that's not appropriate to look at the color of your hat And obviously we can change color of your hat for lots of other characteristics So there were three different definitions of fair And whichever one you say is right, I can come up with a counter example to show that a different one is correct So when we think about trying to do the right thing in data science or trying to do what's fair and unbiased You know just remember that it's not just A data scientist needs to do the right thing Sometimes it's not clear what the right thing is and I've had this entire conversation without ever talking about really what data science should do It's more just conceptually what should we be trying to do and there are other ones I just did the top three that are kind of intuitive to people But I could define two or three other definitions of fair as well So you need to kind of think about what does it mean to be fair? What are the implications? Is it okay to use attributes? Is it not okay to use attributes? What's the implication of using the attribute what's an implication of not using the attribute? Sometimes it's legal. Sometimes it's not so there's lots of different ways to think about that. You can think about groups What level of groups makes sense? Is it good to create these Groups by hats or should we ignore that there's groups by hat the second scenario? So what might seem to be simple is not so simple And I think with that I'll turn it over if there's any questions She asks what really does health analytics entail and then it's a follow-up what knowledge domain would you look for Would you focus on it as a potential student? So I'm going to tackle the health analytics one and I can ask that second one Yeah, so health analytics So first of all health analytics is a huge field We can think of health analytics as everything from The outbreak analytics that I talked about at the beginning. So try and understand infectious disease and how it spreads You can think about health analytics as being used in different deep learning Applications like that talk about to detect the skin cancer You can think of health analytics in terms of looking at patient care Seeing outcomes for those patient care having lots of patient records And then trying to understand what type of patient care works best for what type of patient based on their condition So those are I guess three or four different examples that I just riled it off And they're all I would put under the umbrella of health analytics So on the second part was kind of what background would be helpful or something like that What would you focus on as a potential student? So I think you know it's it's important To have enough knowledge to know how it can be useful But I don't think you don't have to be like a doctor A medical doctor to also be somebody that can contribute to health analytics So I think the best way to think about it is you have to have passion and interest in that domain But you can learn by talking to the domain experts as well Another question I'm about a couple specific languages So we're saying when it comes to data visualization is Tableau the preferred choice over power bi So Let me think about the shortest way to answer that So I would say when we think about visualization and visual analytics We should think more conceptually about techniques that are useful And then yes, there are tools that are better or worse I definitely think you know Tableau for example is a perfectly good solution But maybe in two years, there's a different better solution. So I wouldn't equate Kind of a technique like visual analytics with tools. I would say there's different tools that can be used And over time those tools might change And we consider Gmail text recommendationally as text analytics Yes, so so Gmail and lots of other tools will suggest things so Microsoft does the same thing they'll Correct spelling, but they'll also make suggestions about finishing the word for you Or even finishing a full sentence. So that is definitely predictive analytics It's it's making a small little prediction that you can agree with or not agree with If you think about it, if it's always wrong, it's going to be really irritating If it's always right, it's really helpful. And so that is definitely Text analytics using deep learning. That's a good example You were sharing more about your life experience. You talked about it a little bit Yeah, so that's that's that so yeah, I didn't really talk about myself in my background. So I'm I'm more than happy to do that for for maybe a minute or two without boring everybody else. So I came to Syracuse about seven years ago Before that I worked for mainly large banks Uh, so I worked in doing predictive analytics one of the big projects I had I was in charge of Credit card authorizations. So anytime you use a credit card You kind of wait like just like a second and then it says approved So I was in charge of saying yes or no based on different rules. And so you build up predictions And You would think it's very easy and there's really two aspects to that So if maybe talk about that one because it's everybody can kind of understand it So there's can you afford to make the charge? Can you afford to actually do it? That's actually pretty easy Like you have a credit limit We have rules and ways to think about can you pay even more than your credit limit and things like that? That's easy the much harder part is fraud Because there's what we call bad actors Which is you know mobs in various parts of the world that are proactively trying to outsmart you So that they pretend that it's good and then they do all these charges And then the company's out all this money. So you're trying to figure out basically are you think You are who you say you are and That can be a challenge and you try to do various different techniques to build up patterns Of what's acceptable and what's not acceptable So it's a very simple kind of a pattern that's bad Is if you buy something in Syracuse, New York In an on-present location and then 10 minutes later you buy something in I don't know Miami, Florida some things up, right? So that's an easy one. That's like very simple But you can do things that like just don't make sense from a typical buying pattern Or you typically buy this type of thing for this amount Now we see something very different So you look for patterns and you make a prediction Whether it's you or not and so sometimes if you use credit cards Every once in a while for example when I use a credit card I do something out of my normal And I get a text But did you really do this transaction? Like yes, I really spend all this money. I'm sorry And I say yes, and it's fine But if I didn't say yes to that they would have canceled my closed down my credit card until I talked to them It's like that. So that's an example of one of the things I did in my previous life Your question from Wendy asking does predictive analytics involve machine work? Yeah, so that's a great question. So I think There's lots of different phrases that I use and Raju's that there's not a clear consistent definition So machine learning data science predictive analytics Are three but there's others that kind of get used artificial intelligence all have subtly different definitions But those subtly different definitions are not consistent. So if you ask poor data scientists They actually might give you four different answers. So I think to keep it simple I would say machine learning is the act of trying to make a prediction trying to do predictive analytics There's other ways you can do predictions besides machine learning, but that's one of the key ones that are used today And so these different techniques while subtly different at a high level you can think of them all about trying to make predictions using data Automatic so I think we should think that nothing's automatic that machine learning works because The machine learning algorithm Is looking at lots of data that we've given it And then based on that it's going to make a prediction. It's going to learn about You know, here's blue marbles and red marbles and how can we predict what's what or This blotch looks like cancer and this blotch does not look like cancer And over time we correct it and the machine learning algorithm learns So it's not really automatic it learns through looking at data and typically through data that we kind of tell it correct or wrong or Looking at a whole bunch of different cats and then eventually it can The machine learning algorithm can look at an image and say that's a cat Because I've seen lots of images that are cats and lots that aren't so it's not really automatic But it can seem automatic to people that are using So we have just about five minutes left of this presentation. So we actually have a great question from In iSchool along so they're working for a credit union currently building a data warehouse and they ask What's the scope of integrating predictive analytics with data warehouse in the future? So data warehousing for the i'm going to kind of Simplify data warehousing to kind of it's a technique that can be used to store lots of data Which is really important for predictive analytics and building machine learning algorithms Because we need lots of data for the machine learning algorithm to work well So in many situations Having a data warehouse or some other repository for lots of data Is the first step in building a data science or machine learning or predictive analytics because without all the historical data You can't build a model that's accurate. So it definitely makes sense to connect it to logically Then very often you first have the data and then you can build a predictive model I guess we should ask if there's any questions in here, too We just came to listen Sure, so we're asking how data analytics applies to situations where back testing is not applicable For example in climate related situations. So that's a good question. So let me first explain what back testing is So if you have lots of data One way to see how good I'll simplify what back testing is one way to see how good your algorithm is how good your machine learning algorithm is If you have lots of data you can use I don't know maybe two-thirds of the data to build up your model So you kind of train your model you build your model based on two-thirds of the data But you hold off one-third And then you want to see how good the model was And you pretend you just found all this data now You just didn't find it but you held out a third which data scientists call test data So you test your model with data that you've previously collected But the model doesn't know about and you see how good it is So there's a lot of what's kind of known as like testing and learning from that So you train and then test the model all on historical data So that's a very traditional way to understand what's going on But in lots of fields and climate change is a good example We don't have that much data to do back testing And so you still try to do that you can still have a model and you can kind of hold out some data And then see how it works But there are other techniques that you can use to try to understand how the model is behaving And sometimes like In the netflix example where it's being used in real time you can see how it's being used in the future So in climate change oftentimes There's some prediction about the future And then a few years down the road you can kind of see if that's being held true or not