 So what's the mindset to really be a data scientist? I mean, what is, what are you thinking about? I mean, there's no real manual. Most people are born with math skills, economics, these kinds of disciplines you mentioned. What should someone prepare themselves? How do they approach it? How does someone say, hey, I want to hire a data scientist? How do I follow the rec form? Yeah, yeah. These kinds of things. Well, I tend to, I played a lot of sports growing up and there's this phrase of being a gym rat, which is someone who's always in the gym just practicing whatever sport it is that they love. And I find that most data scientists are sort of data rats. They're always, they're always going out, grabbing a new data, you know, they hear like, oh, Yandex has a new data set and they've got a competition for it. So I immediately go and download that data set, you know, pull it into pandas and Python and then just, you know, manipulate it, just check out what's going on. And so there's a genuine curiosity about seeing what's happening and data that you really can't teach. But in terms of the skills that are required, I didn't really find any one background to be perfect. So I actually put together a course at the University of California in Berkeley and taught it this spring called Introduction to Data Science. And I'm teaching it again, teaching it again this coming spring and they're actually gonna put it into the core curriculum in the fall of next year for computer science. And so what I really try to do is break down what are the things which I see people use frequently in practice which are not taught well in the undergraduate curriculum. So the five components of that introduction to Data Science course are number one, data collection and integration. So, you know, often times in a machine learning or statistics class, you're handed a perfectly cleansed data set. You're not actually asked to go out and acquire that data set and integrate it with your existing data. And so I find that's a core skill that isn't taught well. The second component is visualization design, particularly dashboard design. So once you've kind of collected or integrated a data set, the first thing you wanna do is see what's going on in there. And so visualization is still remarkably not taught at a lot of universities, but even when it is taught, it's often taught more on just the chart design. So we're trying to go beyond chart design and also go into dashboard design. So guys like Steven Few make a lot of money by teaching people how to do this in seminars. And I think integrating it into the undergraduate curriculum makes sense. The third component of the course is on large scale experimentation. So most large web properties have a sophisticated A-B testing infrastructure. And they're able to rapidly design new features and then deploy them to be tested. And so they define certain objective functions that they wanna see how feature A performs against feature B and make a decision based on the data about what you should deploy. So we talk about what that looks like in practice. The fourth- What simulations and stuff? What not? You know, standard hypothesis testing, which is often not taught as much in the undergraduate statistics curriculum as sort of distribution design. You know, things like t-tests are taught. So putting a t-test in context, what does it look like to actually deploy that? The fourth component of the course is on causal inference and observational studies. So the majority of data, you know, I had a non-linear dynamics professor who started off a course in college once by saying dividing the world into linear and non-linear dynamics is like dividing the world into bananas and not bananas. And that's kinda how I feel about experimentation versus observational studies. Everything is an observational study. It's very rare that you get to control the assignment of treatments to subjects. You're often essentially handed those assignments and forced to do as much causal inference as possible. And I've often found when people say we found nuggets in this data, what they actually mean is they've performed some form of causal inference and they're able to say that if we do X then Y will happen. So I try and teach sort of emerging techniques. You know, guys like Judea, Pearl in the late 80s and early 90s really made huge strides in how to do causal inference and observational studies. And those methods are just now finding their way into everyday social sciences. So I try and teach that to the folks. And then the fifth and last component of the course is on data products. So it's about, you know, oftentimes people know how to fit machine-learned models, but once you have that model, how you deploy that into production, and then how do you set up a regular refresh cycle? And how do you evaluate the performance of that model once it's in production? So things like people you may know. This is the classic cross-disciplinary trend that's required now. I mean, you have, it used to be you're great at math, sitting this chair, performing these functions, great. Now you really need to have this cross-discipline, especially in CS too, you know. Yeah, there's really a focus, I think, on staying, you know, hewing close to reality, staying close to the data. You know, when I first went down to Wall Street and worked as a quant, not that far from here, back in 2005, you know, my boss sat in a room with a whiteboard and a drawer full of papers. And that's how he did his job. Whereas today, I think that people who are really driving innovation on Wall Street are doing their job by, you know, gathering data sets and interacting with them in an iterative fashion using tools like R. So I think that we really over-rotated on complex modeling and under-rotated, you know, data munging and data null.