 All right, thanks a lot, Jeff. Great, so thanks very much for being here. I'm going to dive right in. And you know what? I even have cards here. So I'm going to start off by talking about intelligence tests. This is Alfred Binet, who is often credited with the idea of coming up with what is the most common way of measuring IQ. And he did it in the 1890s. The idea was to create a bunch of tests. He thought that would help us identify children who were in the educational programs who needed special attention in some way. He was contracted in France by the Ministry of Education to do this. And he was trying to come up with a very specific thing. He was trying to figure out which kids needed more help. And so he created this idea of tests. And he specifically, what's important to note here, I think, is that he went out of his way to avoid this idea of hereditarianism, this idea that intelligence or what he was measuring was something that was innate. Because his actual purpose was, once these children were identified, to be able to help them in some way and address their needs so that they could catch up with the rest of the students. That's sort of a misnomer, by the way, just because something is innate doesn't mean it's not treatable. I wear contacts, for example, that is an innate thing that I'm born with, but it is treatable. And I'm able to function pretty normally with these contacts. The intelligence tests took a life of their own. In World War I, approximately in the 1920s, they became incredibly important as a way of figuring out where to allocate resources during the war, human resources during the war. They then became, thanks to two people, Goddard and Terman in that time, really tried hard to push for an agenda where the tests for general intelligence would be used throughout the United States and perhaps the world as a way of understanding human hierarchy and figuring out how to be more effective as a society in the workplace by figuring out where people belonged. And I think there's a lot to learn from here, so I'm going to come back to that. The basis for what Benet came up with and others ran away with, in particular Spearman and Cyril Burt, who we'll talk about a little bit today, came up with this idea of factor analysis. And factor analysis is a technique that I'm sure many of you are familiar with in one form or another. The idea is that we have all of these data and we want to describe the variation in the data essentially by being able to rotate access in some way into determining these principal components, a new orthogonal basis set that would capture the variability in the system. A good way to think about this is bones. So measuring bones is a great idea. I could measure different parts of the body or measure different parts of the skeleton throughout my life. And you'll notice that my wrist bone and my finger will grow over time. And if I were to abstract that out, I would be able to maybe find a general trend of growth in the human body. This is probably a pretty safe place to say that I was able to find a principal component that represents this general idea of growth that is represented throughout my body in various ways, but in different extremes. My eyes grow very little, whereas my femur might grow a lot throughout my life. When applied to these tests that Cyril Burt came up with at one point that was 54 tests, he was throwing out tests. In fact, at one point he said that I don't even care what the tests measure, let's just do a lot of them and it'll work. Because you can do all of these tests in these different subjects and using factor analysis, which was invented for this purpose, I can find a general direction. And that Spearman's G, G for general intelligence, by the way, is a measure of what we imagine then as an abstract general common thread through all of these tests. And then we would have this other vector that was what was left over after the projection, if you will, the dot product or the cosine, whatever you wanna call it, as a specific intelligence that applied for that test that wasn't measured by the general variability. These ideas are constructs. What we're constructing here is a little bit like the idea of motivation in psychology or the idea of center of gravity in physics. These are useful hypothetical variables that allow us to understand something but actually have no observable quantity. You cannot observe a center of gravity, you cannot observe motivation. They are constructs. The danger comes when we start attributing those entities as being real in some sense. By the way, it's interesting to note from my perspective, when I find a technique and the history of the technique informs me a little bit about where it comes from, I was really surprised. I used principle component analysis. When I discovered this, I felt a little bad about using principle component analysis. I was like, holy shit, this thing was created in order to do something that I don't agree with. And that actually happens a lot. So what is it that I do? I'm a consultant. I engage for sometimes weeks or months in an interaction with a client of mine. And we do strategic work, we do technical work, we do advising work. I say we, sometimes I work in teams. The Data Guild is a group that I work with fairly frequently. They're based out of Palo Alto. They're a data product studio. So I'll mention a little bit about what I think a data product is. I work a lot with data kind SF, as Deb mentioned as well. This is a group that is seven locations now, Singapore and Bangalore and Dublin, Ireland, Washington DC, New York, San Francisco. And we are trying to connect volunteers who have data skills with nonprofit organizations who have some cause and help them in some way using the application of these skills. Outside of that sector, when I'm working with the Data Guild or as a freelancer, I'm working in energy, healthcare, and most of these industries in the last few years have been a focus of mine. So the factor analysis is something that's important to me because I work a lot in outlier detection problems or anomaly detection problems. And so briefly describing what that is, I imagine that many people here are already familiar with this, but the idea is we have some normal data, but we also have data that doesn't quite fit into this normal category. Some of that is noise. Some of that is anomalies. And what we define as an anomaly, in many cases can be very subjective, but this is the general idea, the idea being that if we think something is an anomaly or an outlier, we might say that it is so because it seems as if it's generated by some totally different mechanism. And we wanna understand what that mechanism is. Here's some typical examples of outliers. In the first case, we have a classification problem as in the eggs, perhaps also a classification problem. We have unusual data points in time series or in various other cases. I'll point this out that outliers are often something that is ignored. It's really actually interesting to flip that on its head and make outliers a subject of study. Outliers are an important facet no matter what you do. This is an example of two basic plotting libraries used in an R environment. It's a statistical language. These are both defaults for plotting the exact same dataset, but they look quite different. And that points out that as people who design tools, you are making very subtle decisions about what is important when you consider outliers or do not consider outliers. This is for Roger Peng's course, by the way, on Coursera, if you haven't taken that, I would check it out. It's a great way to understand some of these concepts. Anomaly detection is applied in all sorts of different arenas. One that has been important to me is in the energy space. We're interested in smart meter data. And we can use outlier detection in this context for a variety of reasons. The one which has the most attention and money, of course, is in fraud. We're interested in when people are attempting to steal energy. This is a tapped electrical wire, I believe in Italy, where this is a fairly common practice. So this is a way for people to try to get energy for free. And we wanna be able to detect ways or situations where that happens. But anomaly detection in this environment can be used for a variety of different reasons. For example, you can detect broken meters, for example, in the network. And there are all sorts of other applications that are still being explored of what you can do with smart meter in this context. So part of the problem is to really understand what is an outlier and what does it mean, what can we do with it? I'll use that example to point out some real world constraints that I'm commonly dealing with in a particularly an anomaly detection context. You often have this idea of supervised learning. When you're dealing with anomaly detection problems, you often do not have fully supervised problems. Fully supervised problems is a situation where you have two categories or more. And you're able to have lots of examples of them and then you try to find a boundary through your feature space that separates these two sets. A semi-supervised problem may be a case where you are missing data. So there's an idea of a positive unlabeled class where you do not have labels for what you're trying to find. They are part of the normal data set and you're trying to figure out how to isolate them from the normal data set without them being labeled. You can also have situations where you only have a positive class or you only have a negative class and you do not have the other. And you have various combinations of these. You also have various combinations of variables available. So even though we're looking at smart meter data, this is complicated from the fact that the true system that we were building was taking into account the information such as billing information, demographic information of the household or building that was being studied. You're also studying things like customer service. How often was that site visited for what reason and when did they start their billing service, when did they end their billing service? These are all very different types of data. Some are categorical, some are numerical, some are state systems. You're looking at a progression of a cycle such as a customer service repair cycle. How do you combine all of that information in one space? It's a common issue to deal with. Interpretability is really important. In almost every project I've worked on, we're trying to deliver something that is not just a number. You have to be able to say something about the number. Why does it exist? And system feedback loops become important in anomaly detection problems. So you're identifying these anomalies for some purpose and then the idea is that something happens then with it. We want to understand whether the anomaly was, that identification was a success or a failure, whether it led to more information, whether we can now take something back into our machine learning system that improves it over time. So we're really trying to get all of the information that we can even when we're starting with a completely unlabeled case. The way we deploy models is something I'll talk about later, but it's very important to have a variety of different models that you can plug in and plug out of a given application or a data product. Anomaly detection problems, I think it's worth pointing out, follow a lot of different approaches. Extreme value analysis is the one that we're perhaps most familiar with. The idea is we assume that there's some distribution like a normal distribution for the data and you identify extreme values as being those that are many standard deviations away from that normal or from that mean. This is actually much more useful than it sounds. You may sound dismissive of that idea in some cases or I maybe did, but it actually comes into use and I'll point that out. Probabilistic models is this idea that you create mixture models, for example. You find the parameters of that mixture model and then you can find those that don't fit that even if they're not extremal in a general case. So in this example, you have a set of six numbers. The extreme or the outlier value is 50. However, it's not the extremal value. It's actually the mean. So in this case, a mean has turned out to be the outlier. Probabilistic models can identify that. Linear models are also very familiar case, make a line a best fit and see what deviates most from that line a best fit in some way. Proximity models are like clustering, essentially. You can do nearest neighbor problems. You can do clustering. You can do density-based approaches. Information theoretical models are getting much more in use and I'll try to give an example of this later. The idea is that information such as this collection of A's and B's can be compressed in an information theoretical way. And you can identify outliers actually by seeing that, all right, let's say if I remove this data point, does that dramatically increase the compressibility of the data? And if it does, then it's probably an outlier. So in the bottom set of letters, there's a C. If I was to remove the C, suddenly my data compresses much more efficiently than I did before. This suggests that it's an outlier. If I remove a B or an A, it will not change much in the compressibility algorithm. Sequence-based analysis is based off of things like Markov chains and forecasting is this idea in a time series. We use historical data to predict what will happen in the future and if something is dramatically outside of that forecasted value, then it might be an outlier. Most real systems will combine these values but the general idea is the same. We find a way to describe normal data, then we compute outlier scores based on a deviation from that normal pattern. Here's an example where that extreme value analysis comes in handy, where we have a signal shown in the top there which changes mean at some point, this is obviously a toy example. The way this is being detected actually is by constructing a global window or a global which is calculating the mean and standard deviation at every single new point that comes in. It's a streaming problem. We also have a window set of data which is maybe just constructed at the last 10 points. And we compare the mean and standard deviation in the window with the global. And if there's a significant difference, it says that something different is happening in that window than was happening in the global population of data. And if that's the case, there may be something happening. This is a change detection problem which is a subset of anomalies. And so we have this idea of extreme value analysis or this idea of computing summary statistics in two different scenarios and comparing them in a window versus global or in a small window versus a big window. There's a variety of statistical tests that we use in this case. It's not just mean and standard deviation. There's Z tests and T tests and all sorts of things that we can use. Stopping rules are really important. If you notice the red line where the change was detected or triggered, it's actually a little bit after the change actually occurred. And that is an artifact of the fact that we need to choose how big the window is and we need to choose when the change and in this case, the Z score, the residual over there or the global variable down here, how much of a change constitutes an actual anomaly? And those are subjective decisions that the designer of the system has to make based on what they know about the system. K-means is an example of mine that I love to use. I think it's a system that many people are familiar with. Can I get a general raise of hands? K-means sounds familiar. Okay, so the idea here is that a really great way to think about it is that I have a collection of data points in a feature space and I'm going to try to find different clusters by making little paper cutouts. So I cut little pieces of paper. I can choose how big to cut that sheet of paper. I can make it small, I can make it big but it's generally circular and I wanna cover points in the data set with these circular cutouts. This is effectively what K-means does. It actually uses something called Lloyd's algorithm behind the scenes but this metaphor helps me remember that K-means is not ideal. It assumes that the data follows isotropic characteristics. It's generally circular. I'm not making ovals, I'm making circles. I need to choose how many circles I'm gonna actually cover the data with ahead of time so I'm making those paper cutouts. It assumes the clusters of convex. Let me talk about that next. It has some other properties that are not ideal but it is very simple to use. It's very powerful, it scales well so we often try to use K-means when possible even when we have to combine it with other mechanisms. Here's a non-convex data set. The idea here is that the data is constructed in a way that you cannot draw straight lines between two points in the data set that leave, that stay in the data set. So a banana shape is a canonical example of this. This is what's the example used in scikit-learn's documentation which is why I pulled it out. It's something that you can look up pretty easily. If you try to cover these two bananas we instinctively as humans know that one banana is a shape of one cluster and the other banana is the other cluster but if we try to cover these with little circular paper cutouts K-means is pretty poorly. If you choose K equals two it's dividing left and right kind of here and if you use K equals three then you have these clearly not banana clusters. So K-means doesn't seem well suited to solve this problem and it's not but we can, I had to throw this in. This was an actual quote from Thor by the way and when he said the word anomalies in Thor I was like this is so cool I'm gonna use this one day. So I did. This is what we've done here is we've actually found a very large number of centroids using K-means and I just picked an arbitrarily large number here. I don't even know what it is. And then we use hierarchical clustering which is a concept. Well in this case I'm using something called single link agglomerative clustering. Basically the idea is I look through all of these centroids all these data points here which I use the centroids as my input data. I find the two centers that are closest to each other and I combine them and I do this again and again always picking the two closest cluster centroids and linking them. And so these clusters kind of come together and you'll see actually in this image that we actually do get this point where we're able to separate out these two bananas effectively. This is an example of an ensemble method where we have used the advantages of K-means which works very well on huge data sets brought it down to a very small data set and then used a different technique that has less disadvantages than is able to address the question, the specific nuance of the problem that we're trying to solve. Here's a little bit more complicated of an example. In here we're studying EKGs and so the idea here is we put sensors and measure electrical signals that the brain, that the heart, excuse me, not the other thing. The heart has these contractions. So in the P wave we have this initial contraction of the atria in the QRS complex which is the big spike in the middle, we have the contraction of the ventricular, of the ventricles which are a much bigger contraction which is why there's a bigger electrical signal. And then we have this T wave which takes place a little while later. That's the ions essentially recuperating. It's a refractory polarization as it's called. The ions going back to their normal state. So this is a solution that was suggested by Ted Dunning in one of his talks. I think it was at Strata. And so I constructed it based on his suggestion. This is just a toy example. You would have to really play with this to get it work with a larger data set. But the general idea is that we construct a dictionary by taking a large amount of our training data and breaking it up into segments. Then we use these segments, find common ones essentially through clustering, and figure out a dictionary that can be used to describe the signal by constructing these segments together, by pulling these segments together. So we first create this dictionary and then for a test signal, something that we're trying to test as a normal or anomalous signal, we reconstruct that signal as best as we can using that dictionary that we've just constructed, that codebook that we've just constructed. If the reconstruction is really good, that means we're able to reconstruct the signal. That means that we're seeing mostly normal stuff going on in this test signal. If there's a large residual instead, that is that we did our best to reconstruct it from our codebook, but we're seeing things that our codebook just can't handle, then the residual will be very large and we can give it a high score. It's more likely to be anomalous. So here's an example of a bad reconstruction. We have the original signal on the second row. We've tried to reconstruct it and it goes crazy because it doesn't know how to reconstruct it for some reason and we have a large residual and we can say that there's something weird going on here that was not seen in our training data. This, by the way, is an example that you can think of when we were describing the anomaly detection scenarios. You can think of it both as an information theoretical problem. If you think of these windows here as a codebook, then the fewer windows or the fewer different types of windows that are required to reconstruct the signal, the more compressibility the signal has and therefore the less likely it is to be an outlier. A lot of these anomaly detection problems, the hard part is in figuring out the design of the system and the representation of the input data as many different ways to represent the input data. Once you figure that out, then it becomes much easier to solve the problem and in most real cases, there is one representation that is quite superior to the others. Figuring out what that is requires subject matter experts and a lot of experience in the domain of the problem. So that brings me to this idea of a data product. When I say data guild is a data product design studio, we're actually constructing a holistic solution to a particular problem that incorporates many different components. It's not just the machine learning design that we think of. You don't have to read all that, don't worry. The idea is that we have to consider all of the business aspects. We need to consider machine learning from the perspective of a singular algorithm and how to optimize its parameters and hyperparameters, but we also need to consider an ensemble design, how to represent the data, how to combine different machine learning techniques, boosting, bagging. We think about the presentation layer a lot. We think about how that information is going to be given to somebody so they can do something with it. And in many cases, the data cleaning and the presentation layer or the representation and the presentation layer determine most of the success of the problem. There's canonical examples of how many recommendations Amazon shows you. They show you one recommendation for a similar product versus 30. You get very different behavior. So if the thing that you're trying to drive is the behavior of the end user, then the presentation layer is as important, if not more important, than the algorithm using behind the scenes. And there's all these other components as well. There's this idea of system feedback loops, which I don't know if it's up here, but I'll come back to that if we have time. So the real question as a consultant that we're trying to ask is that, what is it, what is it we're actually trying to do? People will come and say that, hey, I'm trying to build an app that does this or I'm trying to build an algorithm that solves this. Well, that's not actually what you're trying to do. You're trying to solve a problem and you think that the solution is to build an app or to build a machine learning product, but actually that may just be part of the solution or it may be the wrong solution entirely. So you really have to try as a person who is coming in as an interventionist, as a consultant who's gonna be part of the problem for a small time, and they're trying to break off what they want you to do while you're trying to understand the big picture. That's something that you're always gonna be or I'm always struggling with and many of you probably are also familiar with. This quote, by the way, I'm hesitant to use, it actually may not have been Henry Ford that said this. So use with care. Constructing a data product or being a consultant is also very much in my opinion about thinking about the limits and consequences of what we're trying to build. In many cases, we're being asked to do something with insufficient or incomplete data. What is it that we can actually see about the system and what is it that we cannot see? What consequences will it have? People want a single number as a result when actually we should be asking for our uncertainty, how confidently can we state that number? That's something that has to be constantly reiterated. Cognitive biases of both the people constructing the system and the people using the system after it's been constructed, that's something that has been a subject of conversation that I think many of you would be familiar with for quite some time. Daniel Kahneman's new book, I think, is a great, great introduction to that subject if you haven't read that, but it's a recent book. People have been talking about this for a long time. In fact, Francis Bacon, who someone mentioned during lunch today, first came up with this idea of idols of the mind and this idea that in the scientific process, we should realize that we're very biased and we have all of these vulnerabilities through our brain, which is, by the way, interesting from, as an aside, Francis Bacon is often quoted as being the most objective person. He came up with the scientific process and he believed in the objectivity. That's actually not true. He believed that people who study and use the scientific process or people who study science are very biased and have all sorts of idols of the mind that need to be compensated for and that was his big contribution. Systemic biases are slightly harder to identify than cognitive biases. Systemic biases are this idea that even as a whole, as a community, we have these tendencies that may not be intentional. What are those and what are the assumptions we're making in the construction of the system? The systemic biases is something I'll talk about a little bit more, but I think it's worth pointing out because I think that's the one of those that people think the least about. Stephen J. Gould is the one who came up with these categories of philosophical traditions. These are general things that as we move towards this Western way of thinking about data and quantification particularly, there are these ideas that we need to be aware of. Reification, I'll use the example of the IQ test at the beginning to point some of these out. Reification is this idea that we take an abstract concept and assume that it's real. Assume that it is something that can be observed or measured like we did with intelligence. Reductionism is this idea that we take these very complex phenomenon that usually have random components to them. And we try to take this irreducibly complex phenomenon and explain it through the behavior of some constituent particles. And that's not always possible and it's often not taking into account as you see in the XKCD comic on the right. It's not taking into account the many nuances of the situation. Hierarchy and ranking. So in the energy problem that I described earlier, we were looking at smart meter data. Ultimately what the client wanted was a list of those smart meters that were ranked. It was most likely to be theft. All they care about is the number, right? All they care about is the ranking, which is fine, that's what the client wants. It's our responsibility sometimes as the data scientists to understand what the nuance is behind that ranking and be able to communicate that effectively. But also understand that ranking and hierarchy is something that we have a tendency towards as a society, what does that affect? Is that happening? Can we really just take everything like intelligence and turn it into an ordered ranking? Dicotomization is taking that step a little bit further and saying that, well, can we then just separate everyone into a category of, as Cyril Burt did, intelligent or feeble minded? And just draw that line somewhere. What does it do when we draw that line when there's complex behavior actually determining it? And we give numbers just generally a very special status, right? We assume that someone quotes a number, right? It quotes a statistic. It's objective and that's not the case. In all of the examples that I gave, there's all sorts of design decisions going on behind the scenes. And the objectivity is lost when we're making all those design decisions or when we're collecting the data in the first place or when we're trying to use a number to represent something. Here's some examples. I think that it's easy to say that these things don't apply to me. But here's some examples that are actual products. This is Grindr, if you haven't heard of Grindr, this is like, what's that dating site? Tinder, it's like Tinder for gay guys, right? But if you notice the accidental recommendation on the bottom left here is pretty interesting. What is the connection that's being drawn and by some algorithm to assume that someone who's interested in Grindr is also interested in sex offenders? What is that saying about the system that's lying there? This was a huge gap for Google, by the way. Here's a system that was produced to help people find safe routes through a neighborhood. For example, when you're walking home at night, it had all sorts of unintended consequences of essentially creating a community of affluent folks that were walking around communities of minorities because they were the only ones using the app. They were the only ones inputting data into it. It was an unintended consequence that probably the founders did not hope for, but nevertheless they did set up an environment where that was the likely result. Education and teachers is a pet peeve of mine. I often have a conversation with an educator as a representative from DataKind saying that, well, can we use data in some way to help this blank educational problem? But that community as a whole in education has now been so turned off from data because the natural line of thinking goes, data is used for assessment, is used for evaluation, is used for funding, and therefore all data is evil because we've created essentially through this efforts at quantification, a bias in the minds of all educators that the only thing data can be used for is things that are bad, things that are reducing my complex students to a number or reducing my complex teaching environment to an evaluation metric. And this is a problem. We will be a long time recovering from what we've done in the education space. Dr. Wernick is in Chicago, created Chicago Police Department, created a predictive policing program, Minority Report, like for real. They ranked out people based on how likely they were to commit a particular types of crime, mostly large-scale terrorism types of crime. They actually were able to identify individuals and go approach them and let them know that, hey, this is something that we're keeping an eye on, we got an eye on you. When asked to defend this, he said that, don't worry, we're not profiling, we're doing this in an unbiased, quantitative way. Because he's using numbers? I don't know. What was the inspiration behind this? But I'm pretty sure that whatever he was doing was not unbiased and quantitative. In fact, when anybody says that about a data machine learning problem, you already lost me, right? No machine learning system is unbiased and quantitative. These are systems that are being used today. And the hubris of the situation is that I won't be one to make that mistake. Cyril Burt was in that case. He was a very successful, intelligent, professional. He went out of his way to be objective most of his life. The way he studied left-handedness, the way he made contributions to psychology and statistics were all very sound. And he was very aware of his own personal biases. Yet, he still fell into a trap in the later parts of his life. By the end of his life, he was caught in this correlation causation problem where he assumed one was the other. He had a priori convictions. He knew ahead of time what he wanted to find with his analysis, and he was finding those things. He fell victim to this, and it wasn't many, many years later until we were able to point to these things and identify them. They are not always obvious, and we're all subject to them. So what can we do? Can we just not say anything? Do we just pass over this in silence? What we cannot speak about? Should we pass over in silence, right, Wittgenstein? This is a quote that Claude Shannon used in his, Claude Shannon is the guy who came up with a lot of our modern information theoretical concepts. And he's also the guy who came up with the term bit for zero and one, among many other things. But when he went around and did presentations, he almost always used this quote from Matthews in the Bible. As consultants as data scientists, we cannot just classify. We need to understand that there is a context. Sorry, we need to understand that there is a context of the problem that we need to understand. And then what we return to the person we're building this for, or the persons we're building this for, also requires context. And we cannot simply say yay and nay. So what can we do? We have all of these processes that have been created in the machine learning system. This is the one that I've been exposed to the most. This is called CRISP-DM. The idea of creating these data products follows some sort of process. There's many others. There's SEMA, there's a machine learning process, there's all these things. But we have to understand, in some sense, think about what are the processes that we're using? Those considerations that I'm trying to point out today, where in this process do they fall? Where do we actually think about those assumptions? Where do we think about the risks? If they're not explicitly defined in our diagrams, then they have to be somehow acknowledged as being part of the process. But they're not usually there. So we need to examine those and think about them wherever we choose to use. We can start by just simply acknowledging responsibility instead of saying that this is mistakes that other people have said. Acknowledge that the algorithms that I choose to use or the design choices that I make will have some sort of unintended consequence if I don't think carefully about them. They will always have a consequence, but they will be unintended or they will be unacknowledged unless I go out of my way to do so. Our approach is data scientists to generally collect a lot of data and quantify things and assume that we'll be able to get the right results out of it is something that we need to be very careful of lest we go too far. And in general, our societies pull towards like this numbers and quantification thing can have long-term consequences like it did in education. It's gonna be a long time recovering from that. So what could we have done to prevent that and what can we do to make sure it doesn't happen in other spaces? Don't silo. A lot of data scientists live in little buckets in isolation. They are given problems of machine learning and then they go off in their basement and write some formula, then they code the formula and they deliver a result. But that's not actually how problems are solved. Problems are complex. There's this idea of wicked problems. This idea that a problem can be so difficult that no one person can really understand or encapsulate or even be able to state the problem. So we need a consortium of people. We need collective action. We need to get groups of people together. Whether the problem is wicked or not, any problem can take a better approach if data is part of the solution, not the entirety of the solution. Let's have some trust in other people who are doing great work that are not data scientists that are oftentimes working in this space for a lot longer than we are. We can start a conversation. I was looking for what to put on this slide and found these tweets from, I think, yesterday or earlier this week. Starting a conversation is really important. So my hope in coming here and speaking about this subject is also that hopefully there will be a propagation of this kind of conversation in other contexts, whether or not you agree or disagree with me, just the conversation helps. Ethical frameworks are something that I think about a lot and something that I think applies very strongly in the context of education. We are often, as consultants or as data scientists, working methodologically. We don't necessarily have a vertical in what many of us maybe in the room do, but generally we're not necessarily neuroscientists or lawyers or medical professionals. We have a method or a set of tools that can be applied to a lot of different contexts. And so we're called into many times situations that we don't actually have that context and we're expected to do something very specific in that context and then go on and move on to something else. As a consultant, I'm in that situation a lot. We're like parachutors. We jump in with our laptops. We solve some problem and we jump out. But we do so without that context. We do so without culture of that space. Going and working in the bio space is very different than working in the pharma space, than working in the energy space, the finance space. Many of these other professions that do something similar, accountants, lawyers, have built into their professional standards and into their education a strong emphasis on ethics. There's the Hippocratic Oath. Lawyers are required to study ethics as part of their program. Even engineers are. We have no such thing yet. There is no ethical content in any of the data science programs that are in the professional sense, professional preparation. So where does that need to come in and whose responsibility is that? And how do we create a culture where we as peers talk to our other data scientists about this problem when they're addressing these problems. Not just should we use SVM versus something else, but also what effect does this have and what consequences might it have. This is the slide that I was most nervous about because this is what I've been thinking about. And my opinion is that I would like to use data science beyond that low hanging fruit. Data science is being used ubiquitously now. But it's easiest to get a job in or a consulting project in my case in areas of advertising, in marketing, in places where it's been well established and where there's a lot of structure. Because there's a lot of money in it. It's been proven ground. But those safe options are not necessarily the options I wanna work on. I wanna work on places where we can find new applications for data science that are untested and figure out how to use them to reduce the divide between what Anand Giri Das, I will quote him at the end here, what he described as a thriving world in the wilting world. This idea that we have these people who are really taking advantage of our current progress and people who are suffering because of that progress. How do we bridge that gap? Those are spaces which you can call data science for social good, perhaps. But there's a long ways to go before we start coming up with good ways to use data science for bridging that gap rather than exacerbating it. There's a lot of work. Earlier this week, somebody came up to me and was talking about aging 3.0, I think you called it, and using data science for that. Like, where do we start instead of using data to provide people who have a lot of affluence and a lot of access to data and who are well represented by the data? When do we start addressing those people who aren't represented? And then lastly, I want to really broaden a vision in whoever I talk to, whoever's an aspiring data scientist or looking for new jobs in data science. Why don't we think outside the box of where data science can be used and carve out a new path there? And that's something that I'm struggling with and I would love to hear from anyone in the room or anyone who's listening. These ideas, like where do you think data science can be used more effectively or perhaps less effectively but with more impact? And how do we get there? I'll put this back up later, but that's all I wanted to say. I would love to hear your questions.