 Okay, thanks for coming. Yeah, so this is really just an info session about a new course. When you build a new course, you have to first figure out what's in the course, and then you have to tell everybody. So now is my chance to tell everybody. We've actually completed a pilot of this course this semester with about 100 people in it, and the instructors are really excited about it. The students seem to enjoy it, so now we're scaling it up and thought we'd tell you what's going on in case you decide you want to try it out. So, despite this rather formal setting, I think you could just interrupt me at any time if you have questions. The first thing I'm gonna do is talk about some kind of course logistics. What is this course? Who is it meant for? Who should take it? When you should take it, that kind of stuff. And then I'll tell you a little bit about the course content, and it has an interesting structure, and that it's not just one course, but many. So we have some of the instructors from the other parts of the course here as well, and you can ask them questions too. Okay, so here's the structure. It's quite unusual, but it's working wonderfully. Is that there's a core course, and then there's a bunch of connector courses. So the core course, which we call Data 8, is a four-unit course. Everybody kind of takes the same thing. It's a lecture-based course called Foundations of Data Science. We gave it a different numbering this semester, so if you're trying to track its history, it's very confusing. It's currently called Stat or CS 94. 94 is Berkeley's secret code for course, which does not yet have a number. So now it does have a number, which is exciting, because we're through the full approval process for this course to persist for a semester after semester. So eight is the permanent number, but it is a joint offering between multiple departments. So this is offered in statistics, CS, and information, and it doesn't matter which course you enroll for. If you're enrolling in the same course, it's just cross-listed to demonstrate that we're really all working on this together. And there is one primary instructor. I'll give most of the lectures, but the curriculum was developed by a whole team, and occasionally members of that team will come in and give guest lectures, and it's really a fantastic group of folks who are involved. So that should be pretty fun. So the lectures are held 10 a.m., Monday, Wednesday, Friday, and 1.55 to an L. There's also a two-hour lab section, which is kind of a melding of discussion where one of the GSIs will tell you her or his take on what's going on in the class, and then also some practical period where you're sitting in front of the computer with a partner and trying to solve problems or explore some data sets. So that's why it's called a lab. And right now, enrollment is restricted to freshmen and sophomores, but at the end of phase two, it'll open up. And the enrollment cap, we set very high so that if people want to take this course they can, currently the maximum enrollment is 500. So I think there's a place for you, even if you're currently on the wait list, because of enrollment restrictions, okay? So that's four units, take that. You can take it without any prior experience. This course was designed for people who don't already know statistics, don't already know computer science, and is something that's meant to fit into the lives of all Berkeley students. So that not only means it has no prerequisites, but also that it's not designed to consume your entire life like some other courses are, including one that I have taught many times in the past. So, yeah, so it's a course that's designed to fit into a schedule where people are studying other things at the same time. Because we really feel like this is a foundation that everybody needs. Now, if you are focused on statistics or computer science, it's still a good class for you, but it's designed for the kind of core subset of those disciplines that are important to everybody. Okay, so I said that's a course. There's also what are called connector courses. Each one is two units, and they're being offered in different departments from around campus, each with an individual faculty member running the course, or sometimes teams. They're designed to be taken concurrently with data eight. And most of them meet just once a week for two hours in kind of a lecture slash lab meeting where you learn some stuff and then you try some stuff. That's kind of the best way to learn. And all those sessions are led by faculty. And typically have restricted enrollment of 25 to 30 students. So this is an excellent chance for you to get kind of direct exposure to working with someone from the Berkeley faculty in a small group setting. So you don't have to sign up to it for a connector, but you're crazy not to. You should take data eight, then you should also take the connector that looks most interesting to you. We have 10 of them lined up for the spring, and some of the faculty are here today. So we'll give you a preview of a couple of them just so you know what they're like, but then you can ask questions about other ones and maybe the faculty are here. Okay, I'll pause for a moment. Any questions so far? Yes. Can you just take the connector? So all the connectors list as a pre or co-requisite the core, but anything could be overridden by the consent of the instructor. So you would have to talk to the instructors individually about whether that really makes sense or not. It's true that all the connectors kind of rely on you having learned stuff from the core in order to understand what's going on in the connector so they can go through really interesting stuff even though they only meet once a week. The core course won't change much either from this current fall semester where it's being taught now through the spring and next fall, but the connector courses do rotate. So some of them are the same and some of them are different. For some of these connector courses this might be the only time you ever get a chance to take them, but so far we've seen great retention and that some of the faculty who are teaching connectors this semester decided to stick around and teach them again because they had a good time and some of them have even grown enrollment a little bit. So that's why we have more connectors now than we did in the fall. So I'm hopeful that they'll stay, but we don't make any guarantees about that. Yeah, so once you've taken the core you can take connectors either concurrently or afterwards. We haven't gotten to the point where we're putting restrictions on how many connectors you can take. That's plausible that that would happen later on, but if you get in early maybe you'll work it out. So yeah, that's true, right? Yeah, so it's possible you could take another connector later. Great. So a question that comes up a lot when I talk to students about these courses should I take it in addition to or instead of other courses and what is it gonna teach me and what is it gonna teach me that I already know and what is it gonna teach me that's new? So let me just try to walk through a little bit of that and see if you have questions about that by comparing data eight to some existing courses and seeing what makes sense to take at the same time or before or afterwards. So this is an introductory course. It was designed for freshmen, but if you've never worked on this kind of stuff before then you're coming in fresh as well and it's a combination of introductory computer science and introductory statistics plus a bunch of other stuff including information visualization and a lot of social implications of what this data mean to the world around us and so you can think of it as kind of doing both of these things at the same time. If you look through the curriculum for data eight versus the curriculum for the introductory course, that too, data eight really covers what's in stat two except for it does it in a different way. So it uses a set of techniques that assume you have a computer instead of assuming you don't. It turns out you have a computer. So one of the motivations for building this course was that the statistics department, they were excited about what they were teaching but they knew that they were teaching techniques that weren't kind of the state of the art that they could be teaching assuming that students could program at the same time. But we're teaching you to program in order to learn some of those concepts with a new lens about computation. That means different techniques that are more widely applicable. They also happen to be easier to understand and you let the computer do the work for you, your life is good, you can focus on the real problem. Relative to the course CS61A, which I typically teach, I'd say that about a quarter of the content from that course is in this course. It's a really important subset of 61A. That's kind of exactly what you'd need in order to focus on data processing, data manipulation, data visualization, kind of applications. And it's a lot about programming for the sake of getting something done, learning about the world, instead of programming for the sake of programming, which is really what 61A is about. So they're not the same course. But it's a focus subset where you certainly learn to program in data eighth. You don't learn as many techniques but the ones you learn, you get to keep exercising over and over again on different applications, which means you really build up some great core skills. But there's new stuff. So it's not just like stat two plus 61A and we glued them together and called it a day. No, we kind of talked to a lot of people from around campus about what was important to their disciplines and what everybody should know. And we came up with it. There should be a lot of data visualization. There should be simulation where you can kind of see what happens if you try different processes and then we should explore some of the social implications around privacy, around art, this kind of stuff. Okay, questions. Now, people. No questions. Okay. Okay, so this is Ani. She typically teaches stat two and I'll say loudly what she just said quietly to me, which is that yes, data eight covers what's in stat two, but data eight covers a ton more stuff. Is that accurate? Yeah, okay. So you just learn more. That's not bad. Okay, so what about paths through life? And by life, I mean Cal. I would suggest taking data eight if you're thinking about taking stat two. If you're thinking about taking stat 20, there's a way to learn everything that's in stat 20, but it's not all in data eight. That was too much of a constraint and made us have to leave out some things we wanted to keep in. So Ani has built a connector called stat 88. She'll tell you about it in a few minutes. And that is something that you can take in connection with data eight instead of stat 20. Now what about if you want to learn some computer science? Well, I think everyone should learn some computer science. It's up to you how much, but a little bit is a good idea. We have a great course for non-majors called CS10 and I recommend that if what you really want to learn about is computer science or you could think about taking data eight if what you really want to learn about is data science, you know how to manipulate data. There's nothing preventing you from taking both of these courses, but I just wanted to give you some place to start if you're considering both courses and you're not sure which. What about 61A, which is a computer science course that's an introduction for majors? Well, I think data eight makes sense as an alternative to that for students who want to program for the sake of understanding the world instead of programming for the sake of building programs, which is kind of what 61A is about. So if you think of yourself as someone who's trying to learn to program because you want to do stuff, understand the world, run experiments, gather information, manipulate that information, then data eight is probably the right core material for you there. Or you could take data eight before or concurrently with CS 61A. I think that makes a lot of sense. We have students in both courses right now and they seem to think that they're complimentary without being bored out of their minds because there's enough difference. So you could try that. I think taking it before makes more sense than taking it at the same time is because there's a lot of stuff all at the same time. And it's a great way to kind of build up experience before you take CS 61A. So CS 61A is something a lot of students take these days but people who come in with no prior computing experience really have to work hard. And now you can get your prior computing experience and then you might be able to enjoy it a little bit more. Or you could take data eight after CS 61A. There's nothing wrong with that. If you want to apply your computing knowledge that you've gained in 61A to some new domains, learn about how data processing works using some of the tools you've already acquired and I think there are good synergies there. Or you could just ignore all this stuff and take a data aid and a connector that relates to whatever you're already planning. So when we designed this course we were focused on everybody. Like what is it that is important to data manipulation and processing and analysis that everyone from every discipline on this campus probably needs to know. So that was the motivation in the first place and we've worked out a way for it to mesh well with all the existing statistics and computer science courses. But it is not the case that you should ignore this course if you weren't planning to study statistics or computer science already. Instead, this is a great introduction for people who aren't planning to take a lot of stats or CS. They just wanna learn what they really need to know to make progress. Now questions. Yes. Are there follow up courses? Yes. Courses take a while to design and a while is like they have to, people have to think about it for a semester and then teach it. So there's no way that in the spring there'll be a follow on course. That's plausible by next fall that we'll have a course in place. Certainly if you're early in your Cal career and you're hoping to take more data science courses we would very much like to have those ready for you and they're meant to be kind of fairly direct continuations from data aid. But there's a lot of work before that happens. Yeah, so that's a good question. What's the relationship between data science and data structures? So CS61B which teaches data structures is about computing techniques in order to solve an array of computing problems that you encounter when you're building large programs. Most of data science is focused on building isolated smaller programs where the goal is not to build an application that a million people use, but instead to gather some insight about the world. And so it typically doesn't require the same kind of data structures knowledge that you need in order to build a search engine or something like that. Instead what you need to learn to do is figure out what questions you're supposed to be asking, how you answer them, what kind of simulations will give you a useful answer. And so it's basically completely orthogonal information. The only relationship is they both have the word data. Just highly confusing. Yes. You should attend a lab you signed up for. Yeah, let's minimize the chaos for the first time around. But you can always send me an email if you have a real constraint. Yep, in the back. Oh, good question. So like if you've already taken a bunch of CS and you've taken some statistics, then I think this course will move slowly for you. And you probably won't enjoy it. You might. You can come in and shop it for a couple of weeks, but it wasn't designed for CS majors who already took some stats and now they want to see how they combine. You can already figure out how they combine. This is a more efficient path to reaching the level of understanding that might have required several courses for other people. I would strongly recommend this class for someone who has no stats experience and virtually no computer science experience. That's the whole idea. You're starting from scratch. We're building support structures just for you and we're trying to pace it in such a way that it's comfortable. I mean, it won't be boring. It won't be easy, but it's not meant to really push people and kind of make them feel like they can't do it or something like that. Instead, we're trying to build it up and scaffold it in such a way where it's the right way to learn from scratch, statistics, and computer science at the same time. Yep, great question. So can you take data aid instead of stat two and have a fulfill the requirements for your major? Depends when you're graduating. If you're graduating this semester, I don't know. If you're graduating anytime far in the future, I'm quite confident but not certain that this will be the case because the statistics department has already recommended that this be so and it just takes a while for this information to propagate throughout the world. So it's an awfully safe bet, but I can't guarantee you that it will all work out. Does anyone else with more knowledge want to say anything on that topic? The question was does data aid replace stat two as a major requirement? And the answer is it depends on the major. I'm going to show you a demo in two seconds about how we've investigated that issue. Yeah, oh, if you already know a lot of CS and you already know a lot of stats, there's upper division courses. For instance, there's a data science course called 194 and CS that you could take that kind of expects you already know how to program. It might be a better fit for you. So the question is if you're a CS major, what benefits do you get from taking this course? Most people who study computer science are interested in lots of things, myself included. Whenever you start running experiments in computer science research or whenever you try to start applying computer science ideas in cross-disciplinary ways, you find yourself running experiments looking at data, trying to understand patterns in those data. And so it's kind of immediately applicable there. It's also the earliest introduction to machine learning that you can really get. Because normally in the computer science curriculum you have to wait for quite a while to take a machine learning course. Well, here you can take it very early on if you want to see kind of what it means to make predictions automatically. In the back. There are two exams in the lecture course. There's a midterm and a final. The midterm is in class. The final is a final. And the connector courses are free to examine you however much they want. But in practice, they haven't done a whole lot of examining, I think. But they've done some. Just the right amount. Always. Yeah, I mean, this is kind of a standard formula course where you do take exams. You do have weekly homeworks. There are a couple of projects. And so it will take a fair amount of your time. But I guess there are some courses on these campus that take more time than would be justified by the number of units that they're allocated. And those are often in the computer science department. This is not one of those. This is one of those that really is meant to be scoped so that it fits into your life as you're taking other courses at the same time. Yep. So the question is should you take step 133 and data eight at the same time? And the answer is? Probably not. Probably you should just take data eight and then if you want more content, you can take step 133. So the question is how does this relate to research methods in psychology? There are close connections there. Oftentimes when you're performing a psychological experiment, you collect some information based on trials and then you wanna try to make some conclusions about how the whole population of people work based on only the ones that you studied. And that involves coming up with hypotheses and then testing those hypotheses statistically. That's a major part of the course. It also involves building kind of confidence intervals about what you think is true about the population given what you've observed. That's true about this course as well. So kind of the core techniques that get used in psychology appear here as well. We do them in a slightly different way than most psychologists because like statistics, most of psychology was built up assuming you didn't have a computer because it's an older discipline than computer science. But we'll make those relationships quite explicit so that you know when we're doing something a little bit different than what a psychologist might do and why it's really giving you the same answer even though it's this different technique. Okay. So what do we teach in this course? Well, I kind of told you computer science plus statistics plus visualization plus social implications. What are you supposed to be able to do at the end of the day? I mean, I can show you the syllabus and I can read it but it's pretty long and it's in a small font but there it is. If you wanna try to engage with that, you should ask questions instead of just seeing all these buzzwords and getting scared off because in fact, we're trying to make this understandable instead of not understandable. So instead of reading you the syllabus, I thought I'd just show you the kinds of things that you're supposed to be able to do after you take a class like this. And they're about going and finding information and then figuring out what it says. I had to do this recently because we were trying to figure out how many different departments we'd have to talk to in order to get this course to count first, Stat 2. And so we had to figure out what were all the departments out there, what degrees they conferred and how many degrees they conferred each. Well, the nice thing is that Berkeley provides a portal for all of the people in the Berkeley community to download data about what degrees are conferred. So I did that already and I'll read that in. Oh, it's too small, isn't it? Not too big, isn't it? Try that. So degrees is some table that says in a particular calendar year, some major issued this many bachelor degrees. So there were 17 degrees in African-American studies in the 2010, 2011 academic year. It's easy to find tables like this about anything that you want. The question is whether you know what to do with such a table when you encounter it. So part of this course, the beginning part of this course, is to build up basic programming skills by tackling the problem of manipulating tables like these. What did I want to do? Oh, I wanted to look at now, now. Well, it's actually not now, it's last year. Is the degrees where the year, which is a column in this table was 2014 to 2015. And the first thing we did was wonder, what are the big degrees on campus so we can sort this by the number of degrees conferred in descending order and then what is last year? Well, that's a new table based on the table I had before that only talks about 2014 to 2015 and tells us what bachelor's degrees were earned at Cal or 631 economics degrees, a whole bunch of integrative biology, some psychology, political science, electrical engineering, business administration, et cetera. Now, I just wrote a program. And this is the kind of program that I want you to be able to write comfortably. It's stringing together different expressions that take a piece of information, a table full of data and manipulate it to get you one that is more useful to you than what you had before. But reading a column of numbers is pretty hard. And by the way, this is only the first 10. What about the other 110? Well, the right way to explore a data set like this is to draw it, is to visualize it. Is to say, I don't care about the year anymore because I've isolated it to one year, but please draw me a bar chart that tells me the number of degrees conferred for each major. And computers do this for you quite quickly and in quite a useful way now so that you can see not only one of the top 10, but their relative sizes where each vertical hash mark here is another 100. And so we can see, oh, yes, economics and integrative biology are quite large, but there are other large majors as well. And if we scroll down for a while, we see there's some threshold where kind of, there are fewer than 100 degrees conferred and we reach that threshold right between chemical engineering and bioengineering. So if we wanted to worry about what are all the big majors on campus and go ask them about their requirements first, we now know that we have to only deal with that much of the chart. How much of the chart is that? Well, maybe I wanna know how many different degrees have about 100 people in them or about 200 people in them. That's another classic case of information visualization where we just select the degrees that we want to visualize and we draw what's called a histogram. A histogram tells you the number of degrees. That's what's on the horizontal axis. On the y-axis there, that's the number of different majors that have that many degrees. And we can decide exactly how we wanna bucket it. We can say I wanna start at zero, call the way up to 1,000 and have buckets of size 100. Now it's telling me, well, actually most majors have between zero and 100 degrees conferred in 2014, 2015. There were about 95 of those and then there were fewer of everything else. I mean, that's the kind of stuff that, we didn't have a computer and we didn't have access to this information. It might take a day of emailing somebody to figure this out and now we can figure it out in a few minutes. And that's powerful. What's powerful about doing it in a programming language is that you can exactly parameterize what you're asking for quite quickly. Like I could say, okay, I'm gonna do something crazy. I'm gonna say I'll worry about all these majors that have less than 100 people next year and I'm gonna focus on the ones that have at least 100 people in them. I just changed my bins to say I'm only interested in zooming in on that part of the chart that has majors that have between 100 and 200 people. So there are 14 of those that I have to call up and ask about their Stat 2 requirement and there's another five between 200 and 300 and another five between 300 and 400 and then there's integrative biology and economics. So that's the kind of program I want you to get used to writing. I'm not saying you know how to write it now that I've just typed it out for you but I'd like to communicate to you that the amount of code that you sit there and work with is not like volumes. Instead, it's just a few expressions that let you say what you want in order to understand some descriptive property of the world. Now once you've learned how to visualize information you can start asking more precise questions. You can ask, has it always been this way? Can I predict based on how many majors are being, how many degrees are being offered this year and how many people are interested in those majors in the future what are gonna be the degree levels three years from now? That's a machine learning or prediction problem that we do work on in the class. And you can also ask you know if this is different than the year before are they different just because of random variations or is there a real trend going on? And those are the kinds of things that the same questions come up in data science regardless of what you're asking the questions about. Whether you're doing a scientific experiment or in social science or you're trying to run a business or whatever it is you're always asking, is there a real trend here? What is the trend and how can I make predictions? But first you ask, what do my data look like? What's going on in this big pool of numbers? And you do that by putting it in a chart. But not only putting it in a chart we also work with text data. So we look at song lyrics and we're trying to understand how text behaves if you think about it quantitatively. We look at geographic informations we're trying to understand how California uses water in one of the projects this semester. And so we look at different water districts and their water usage. So you can take information basically from any area of the world and start to visualize it and understand it. And from there, then you start asking these statistical questions. Okay, so that's kind of like a lecture that is a flavor of the sort of stuff that we're gonna cover. Anyone have questions about that? That's a good question. One that I don't have a great answer to. Oh, you don't, oh, Ryan thinks probably not. Oh, so a lot of the same techniques will appear. So you'll kind of have seen that but if you've never learned how to program and kind of executed those techniques using a program well then maybe you'll learn something new. But it sounds like, I don't know the answer but it sounds like there's probably a lot of overlap between what you learn in econometrics and what you learn here. So we have a new point of view which is that it would complement it pretty well because this is far more applied in nature. Yeah, I mean we do really look at actual data sets. We download them, we see what happens. We see where the assumptions hold and where the assumptions fail. So this is not really learning the theory. I mean we do learn the theory but it's not just learning the theory. It's learning whether the theory really applies and how to apply it and how to make corrections when it doesn't. So the other thing I'd say is if you've seen some statistical concepts before it takes a lot of experience to learn how to apply them right, how to correctly formulate your null hypothesis and how to correctly test it. And so it's not such a bad thing to be taking multiple courses where these ideas appear just so you can build up some experience. But I can't tell you for sure, you gotta shop it for a while. Okay, I'm now gonna let a couple of connector instructors tell you about their connectors just so you get a flavor of what those look like. All 10 of them are different. But I can't do a parade of all 10 because that's too much. So instead we've just picked a couple that are willing to come talk to you for a little while. We'll start with Ryan Edwards who's gonna tell you about the health connector. Not the only theme, but I think one of the themes of the meeting today is shop it and come and try us out. So if you're not sure whether you'll learn things that are unique relative to it, the econometrics course that you may have taken already come and check us out. I think we will, especially in the connectors, adapt to your needs and desires. The beauty of this so far is that not only our class size is fairly small and we can adapt to the needs of the students, but you're tapping into an amazing interdisciplinary group of people. I mean, super interdisciplinary. We're talking about electrical engineers, social scientists, I'm an economist, a history professor, people from all across the spectrum. Can I actually get a show of hands? So I think a lot of the questions were from folks who were interested in computer science majors. So how many people are thinking about majoring in computer science? A large number, that's great. How about statistics? Super. Social science? The number, okay, fabulous. So this is a great heterogeneous crowd. And I also would exhort you to find your friends who are not sure what they want to do about their quantitative skills requirement for their major and suggest that they take data science. I wish that 25 years ago, I was sitting where you were sitting now. So I went to university. I took a computer science course having some expertise in physics and math, but ultimately being a social scientist at heart, so I'm an economist. I'm visiting from Queens College in the City University of New York and I was a Cal graduate myself. So I learned from the very best in computer science, but at the time it was geared toward computer scientists. This is a course that is broadly geared and I couldn't be more excited to pitch it to you. It's a great way of learning applied statistics. It's not as focused as say an econometrics course or a sociology stats requirement or something like that, but it could be if you show up and ask us in the connectors to do this. So let me say a few things in particular about my connector, which I'll be offering again in the spring. And so when I was thinking about what to say, this is what I came up with. Well, there's data science in the main class and ultimately I kind of felt like I was teaching a bit of data art. How to think about what you're looking at not necessarily in a numbers sense, but in an artistic way. What are you trying to tease out of these numbers? And there's kind of an art to that, especially when you're thinking about social science. When underlying everything is the behavior of people, which can really be rather, well, unexpected perhaps. Ronald Lee is a professor emeritus of economics and demography here at Cal and he said something that I thought was really astute several years ago that I heard when I'm paraphrasing me here and he said, sometimes your eye is the best statistical tool and that's absolutely right, except when it isn't. So that is a very good baseline to adhere to, but especially when you're talking about studies in the social science fields where you're thinking about the effect of a certain policy on some outcome. Well, if you've run a randomized experiment, you might know the answer to that, at least in the internal consistency of that particular study and then there's a question about whether there's validity to other situations, but you might not know the answer at all if what you've got is just observational data and that's kind of the point. So in health economics, and this is a real standard example here, how do we know whether a new drug, longiviratrol, lengthened life, longiviratrol, I guess, I just came up with a silly name, of course, and so I can't pronounce it anyway. Well, what would you do if you thought about the x-axis here as being your treatment variable? And it's kind of a zero one where you've got a discrete pile of folks who are not treated, that's the placebo, if you felt you were concerned about the mice, you had to give them a little fake pill so that they don't suspect that they're in the control group, that's of course what we do for humans, mice not so much, you treat some mice with it so they get the longiviratrol, it sort of sounds like a pet food, I guess, doesn't it? And then you compare their outcomes across these two treatment and control groups and let's see if this is visible, well, somewhat. So there's a bunch of little mice and the y-axis is your outcome, it's your length of life in years for the mice and so they live about two years and so there's a spread of lifespans there for the placebo group, the control and then there's a spread of lifespans for the longiviratrol group, the treatment. And part of data science is just thinking about there being some kind of a relationship here. So if you have a treatment that's the longiviratrol, how much does that gain you in terms of length of life? And so part of this is to figure out whether the data as you see them are sufficient evidence for you to conclude that in fact, longiviratrol actually works. So that was discussed I think rather aptly already by John and that's a big part of everything. Well, the extra part of the story that I bring to the game though is the fact that there are many influences on health like behaviors where you can't actually run randomized controlled trials and some mice obviously cannot smoke cigarettes. If you gave them, I suppose a blood injection of nicotine, I suppose you could think about that sort of thing. Broadly speaking, we're interested in a lot of influences on well-being where it would probably be unethical to actually have a treatment group in some cases if the thing is a bad or possibly unethical if there was a control and of course there are gray areas here. There was a study out that several months ago was halted because it revealed significant benefits of blood pressure control. And so the way this works is you have a control group that doesn't get the benefit of the treatment the treatment group did and they really benefited and the study was halted because it's not ethical to kind of continue with this because there's such big effects. So when you're thinking broadly about issues in the social sciences of connecting outcomes and treatments, a lot of times we don't have the ability to give our mice cigarettes and see what happens. And in particular, one example of this that leads to a very pretty picture and why I wanted to show it to you is the following. So what is the effect of the minimum legal drinking age on our health? And we might wonder what that might be. The typical problem with this sort of thing, it depends how you measure it, but how you were to test it. People who like risk might both crash cars and drink. So it's not too obvious that the legal drinking age actually affects whether or not they crash cars and cause harm. So rather than compare drinkers to non-drinkers, which is what you could do if you had data on this kind of thing, let's be a little bit more clever. Let's use art to sort of think about how to do this. The art is trying to think about what is the comparison that will remove this problem? Well, let's look at everybody just under 21 to everyone just over 21. So we're combining risky and not risky people and just mushing them all together. There's no reason to think that there's any difference in these characteristics across age, on average, that, well, of course, some cultures believe that when you're born sort of determines your characteristics, but within a calendar year, those people who were born are gonna be all types. And so the average person is probably not gonna be significantly different. Well, if we do that and let's look at the data, suppose you were to plot then the death rate by motor vehicle accident along the y-axis and the age in years and months along the x-axis, here's a cloud that you would see where we're measuring age in years and months and that's why we get so many of these little dollops all over the place. Looking at this and trying to make sense of it is a core element of DS8. One of the core elements of my connector course is looking at this with some art in mind. Where do we draw the lines? And, well, the I might be drawn to one relationship here if taken as a whole. If I were to say, you know, I think age 21, which is the middle of this graph is rather interesting, I might draw these two lines. So in fact, those are the trend lines if you restrict the samples to under 21 and over 21 and it shows this shockingly large effect of turning 21 on motor vehicle accidents. So some art to the science of looking at the data and seeing where your eye leads you. So don't drink and drive. Don't always believe what your eyes see, except when you should, and do enroll in my course, which is LNS 88, Letters in Science. It will be fun and not this hard. Okay, so thanks, that's all I've got. So I'm just throwing in an extra item because I was asked by one of the organizers of these events to mention another direction that you may want to do. If in fact you are taking the DAS science eight this coming semester, which is that there's a complimentary course that we've been thinking a lot about how to tie the two together, where if you take the two, this coming spring, for example, we'll be definitely looking for students who would like to help develop the connector course version of it for the following years. And this is a course that's one of the big idea courses that's one of these Letters in Science, big idea courses that I've been teaching and I'm Sol Promutter and I'm in the physics department. I'm also the director here at BIDS. And I'm working with professors in the public policy school and the philosophy department and also a cognitive science professor. And it's a big idea course is bringing these all together and asking questions about scientific style, critical thinking. So these are the complimentary questions that go along with the data science questions which play a big role in a lot of what we do in the scientific style, critical thinking course, but it's much more from the point of view of where do you want your decisions to be made in a democratic society and why would you trust the scientist versus when would you want to go to the population to ask a question. And then within that question, we basically come up with a wide variety of ways in which if you understand the number of different concepts, you'll understand what is going on in a scientific discussion about a topic. And typically these are ways in which we've learned over the years as scientists that humans fool themselves. And so mostly what the course consists of is one after another different techniques of catching yourself, fooling yourself in this way or in that way. And the course is taught in very much of a short bits of lecturing but with lots of group activities and group discussions so that you really feel all these different ways in which you might fool yourself and then not fall into that trap in the future. And then when you get together with groups in the future, you perhaps be able to together not fall into this trap. So we think that in the long run, this actually will be something that we want to do very much as a connector course topic with the data science world because we think those would be great example ways to show all the examples rid out with actual data. And we are looking for students who would enjoy trying to figure out how we tie this two together and perhaps helping develop the course for the future. So let me stop there and pass it back. Well, and that course by the way, that course is called Sense and Sensibility and Science. And it's a, Letters and Science 70, I believe. You will have to check the number. No, I'm sorry. I think Letters and Science 22. And but Sense, Sensibility and Science if you want to look for it from the spring. Hi, everyone. Thanks for coming on this rainy day. I believe I am unique among the group of faculty here in that I'm the only person who has taught both the main course as well as a connector. And I'm standing up in front of you now to talk about the connector. The connector that I'm teaching is called STAT 88, Probability and Mathematical Statistics for Data Science. And the reason I have been asked to talk about it is that it's a little different from the other. Going deeply into one area of application using the methods that are developed in the main course. The Probability and Mathematical Statistics. The other connectors are all sort of disconnector, has a different purpose. The ideas that you will learn in the foundational course all have a clear mathematical basis. However, the main course has no mathematical prerequisite because those ideas can be talked about without going into all of the details of the mathematics. And so it is intended for people who have come in really with no detailed quantitative training whatsoever. But there are in the class many people who would like to know the mathematical basis for the results as far as it's possible with the mathematics that students have the reasons why the results are the way they are. And because of that, and so the Probability and Mathematical Statistics connector that it has a semester of calculus as a prerequisite. So if you take C8, the main course, plus the two unit stat 88, then the combination of that dominates stat 20 and 21. That is, it contains more than stat 20 and 21. And so it ought to once we are done with this process satisfy requirements for economics and business, Catherine, are you still here? Do we know the result for econ? Main course plus stat 88, are we through with the process for econ? Okay, so I can now announce formally that if you take C8 plus stat 88, then that does fulfill the stat requirement for economics. And by next term, we hope that the process will go through for other majors as well that require stat 20. So what I'll do now is I will stop and I'll take questions, not only about the connector, but about the main course as well. Anybody, anything? Yes. So would it make sense to take C8 stat 88 and also another connector? It would be fantastic. And in fact, people are doing it now. So you get both the mathematical basis as well as an in-depth look at some area of application would be fantastic. Yes, yes. Yeah, that's a great question. So this course is really about data science in the mind. So it's about workings of the mind and we explore that question looking at different kind of cognitive data. So maybe, let me give you one example that I'll be teaching in the course so you get a flavor of the course. So let's say I have a couple of concepts, penguin, sparrow, and let's say Robin, okay? All of these are names of birds, right? But if I ask you, which of them is sort of the worst example of bird? Which would you say? Probably penguin, mostly agree, yeah? So the kind of phenomena that we're interested in is why the mind interprets these objects sort of differently, even though they're in the same kind of category. If you think about sort of giving these concepts to a computer, probably they will return the same kind of answer, unless you encode some kind of model of the brain that the computer will churn out some kind of ratings that are similar to how humans will behave. So this is kind of the things that you wind up doing, sort of modeling the human mind using behavioral data and other kinds of data. Does that address your question? Okay. Right, anybody else? Okay, I will finish off with an answer, an additional answer to a question that was raised some time ago by you, which is are there any follow on courses? And this, what I'd like to say is that there have been follow on courses for quite some time. Data science isn't new. We've been teaching data science, it just hasn't been called data science. So once you take the main foundations course, you should be taking some programming. You should be taking some probability theory, CS70, stat 134, get plenty of tools going. There are machine learning courses in computer science, as well as in statistics, there are courses in the School of Information. There are many. There will also be courses that are now developed directly as follow-ons, but there's plenty for you to do to become a data scientist already. John, last words. I think we're done. Right, no more questions, then we're done. Thank you so much for coming and we look forward to seeing you next time. We will all stick around for a minute.