 Good morning, everyone. It's very much a pleasure to be here. I arrived about 5 AM last night. So if I start to fall asleep, I'm giving a talk. You know exactly why that is. So today's goal is to define artificial intelligence. Say, what is AI? What is machine learning? What is data science? If that's even possible to do, since there's a lot of words flying around these days. There's going to be a little bit of code and a lot of ideas. It's going to be very important to disambiguate the terms in different fields. Because depending what field the paper author is in, they use very different terms. And that can be very confusing for people coming from one field of your statistics versus your computer engineering. That doesn't really help with efficiency. And like I said, we'll see some code, some simple AI. Code is always fun to do. So we'll keep that in there. So let's start with a story. It takes place in 2010. I was at an entrepreneurship event. And I got asked the question, so what do you do? And I wanted to sound sophisticated. So I answered machine learning. I thought it showed gravitas, and I was serious. Well, invariably the response would be, what's that? They had no clue what it was, and it really wasn't helping my cause. So 2010 plus 20 minutes later, I kept getting asked the same question. So what do you do? Now I responded, data science. And they go, I need a data scientist. And I got a lot of business cards that way. It was the same thing, different words. A lot of confusion is created by having different words for the same thing. I was just pitching it differently. So we fast forward to 2019. I'm still going to these networking events, and the conversation still comes up. So what do you do? They're always asking that. Now I say AI. I've changed, or not. And they go, OMG, yes. People love hearing AI. And you get paid more if you say AI. And that's important. It's the same thing, different words. Half of it's marketing. And I think it's very illustrated. This is popular in the beginning of the year, the 10-year challenge, by this tweet. It really speaks to me. We've been doing ML for hundreds of years. We're just doing it faster and bigger now. Because once again, it's the same thing with different words. You'll notice this is a recurring theme. So that brings us back to something that has to be seen at least twice at every data science conference, the Venn diagram. I know at some places they play data science bingo. So this is a free square for you. You just got it in the beginning of the conference. Everyone else doesn't have to show the slide anymore. It's up. But we all know this great Venn diagram made by Drew Conway. And there's a little bit of a backstory to it. He was finishing his PhD. He was bored. He had a beer. He's like, I'll make a Venn diagram. And then this thing went viral and just took over the whole industry. And we all know what it says, the hacking skills, the math skills, the expertise. We all know all of that. But there's a new Venn diagram, the AI Venn diagram. Now, this diagram actually comes from a book I really, really like. It's the deep learning book. This is the Barto, not Barto. That's the other one. This is the Bengio book. It's a really, really excellent book. And they talk about how AI is this encompassing field, how machine learning is a subset of that. And within machine learning is representation learning, which has deep learning. People will disagree of this if they're different, if they're not. But this is definitely defining some of the new conversations. So let's do our best to define AI. Let's do what we can, what it means to us. So this was represented in another great tweet. So I think we've all experienced this in some form or another. There's linguistic implications of what you're saying. So let's try this again. Let's see how this is a better definition. I was speaking on a panel at Lincoln Center, and I heard this definition earlier that day, and I really liked it. What humans find easy, but computers find hard. Now this is very broad, and it changes over time. At one point, it meant chess. Beating a human at chess was state of the art. But then again, AI is in every app now, and it even could be Gary Kasparov. So that's solved. Is that AI anymore? Is playing chess on your phone against the computer AI still? It was. Maybe it's not anymore. It moved on to beating humans at go. The definition of AI is fluid. Now they're talking about playing video games, beating Mario. There's some other big game they just completed recently, Pac-Man. But there's a theme. It's always about beating humans, which can get a little scary. I was at that panel I was talking about, and people were very worried about Terminator or Matrix style stuff, because the benchmark is always beating a human at somebody. And AI is spreading. The government is getting involved. They're trying to regulate AI. The problem is, at least in the States, the people writing the laws don't know what AI is. They're getting very scared about this whole idea of what AI is, and they're creating regulations without necessarily any knowledge. I was at a dinner with a number of ambassadors and congresspeople trying to figure out what they have to do about AI. And there's something I really wanted to say, but fortunately, someone else said it for me, that they want to regulate AI, but no one cared when it was called logistic regression. Stats, AI is scary. Stats isn't scary. So we got to try to make this less scary and remind people of this tweet that we haven't changed what we're doing. The marketing has changed. We're still doing the same stuff. So we're still experiencing the same thing, but different words. There are different implications depending on the words you choose. So let's disambiguate some of those words, not from a marketing point of view, but from a technical point of view. So let's start with the inspiration for that tweet about ML. Regression. We've all seen it as the foundational learner. You want to call this statistical learning. You want to call it machine learning, data science, statistics. You're fitting a straight line. That's all you're doing. And that's represented by this equation, y equals a plus bx. We've all seen this. It is slope intercept form from middle school. I call it middle school. I'm not sure what it's called here, but when you're young, it's called slope intercept form. So let's talk about the a first, the a in that equation. y equals a plus bx. Well, if you're coming from the traditional fields, you call that an intercept. For newer fields, call it a bias. Problem is, in statistics, bias means something completely different. And that's very confusing. Someone entering the field like, bias, wait, why they call that a bias? That's an intercept. And that creates a lot of confusion. Now let's talk about the b in the equation. You can call it a coefficient, or you can call it a weight. Now, to some people, weight might make more sense. But the problem is, in traditional fields, the weights has to do with how important each row of your data set is, not the impact of your columns. So you have these two fields coming at it from different angles, talking about completely different words. Oh, sorry, talking about completely same thing with different words, and reusing words that already have definitions. So let's look at this. This is a binary scaling curve. It maps any number to fit between 0 and 1. I very purposely call it a binary scaling curve, because there are multiple words for this. It's defined by this equation, 1 over 1 plus e to the negative x, or e to the x over e to the x plus 1, choose the equation you like. And it's used for logistic regression. So again, depending on the field you're in, you either call it an inverse logit or a sigmoid. But you really have to ask, which word is less informative? I don't get what those mean. If you tell me those words, I have no way of knowing that it has to do with the binary scaling curve. But apparently those words work for people. So now, let's talk about when you get new data and you want to use it, and you're getting a point estimate for this new data. You're scoring it. Mathematically, y hat equals f hat of x tilde. Your estimate for y is based on your estimated function of your new data. So all machine learning was really doing is estimating functions. Well, you can either call this prediction or inference. Now, this one really, really sets me off. Because prediction in statistics means you're predicting new values. And inference means understanding what's happening. And it is the opposite of predictions. So now you get a new field calling it inference. We're using this very important word from statistics when this new field can't even do inference. And that's really, really frustrating. And again, that brings us back to same thing, different words. It's almost as if the academics in these fields don't bother reading the papers from other fields. And I've read some of these papers. Like, look at this new thing I invented. I'm like, that was invented 30 years ago. But at least people are trying to improve the field. They're trying to add new stuff. And that's always a good thing. It'd just be nice that there's some better cross-pollination. So let's see some code in action now. People who know me know I am obsessed with R. It is my tool of choice. You can do, R is built from the ground up to do data, to do ML. It's been around for decades. So let's see perhaps the most common AI you're ever going to come across. And if else statement. If the price is greater than 100, sell otherwise buy. Let's be honest. Most AIs are rules engines that someone sat there and said, hey, here's the cutoff. And for a lot of projects, that's all you need. That makes it AI. And that's good enough for people. But I have a feeling you're not here to learn about an if-else statement. So let's get a little bit more advanced. For a series of examples, we are going to talk about logistic regression. Now, technically, it's binary regression because logistic is a special case of binary regression. And it's used for classification, a two-class classification. It's been around for decades. You've been able to do this for a long time using the Newton-Raphson method. And whether you want to call this a sigmoid or an inverse logit, it's the same thing. Our goal is to optimize this equation. All machine learning is really just an optimization problem, usually quadratic optimization. That's all machine learning is. You have this cost function or an objective function or a loss function. And your job is to find a way to minimize it. So we are going to minimize this expression. The first set of square brackets is logistic regression. You are taking your true values minus your predicted values. The true values is y times xi beta minus your predicted values, which is the log of 1 plus e to the xi beta. That's to take care of a generalized linear model. It's a linear regression that's been scaled to be on the 0, 1 curve. Now, we're not just doing logistic regression. We're also trying to minimize the objective function plus a penalty term. So the second set of square brackets is the elastic net penalty term. Again, different words. It could be lasso for L1, which is the term on the far right, the L1 norm of beta. It could be the ridge for the L2 penalty. L1, L2, lasso, ridge, different words, same things. And combining them gives you the elastic net put forth by Hasey, Tipshirani, and Friedman two decades ago, maybe. And having this penalty term there prevents overfitting. It does automatic variable selection, and it shrinks down your coefficients to make sure you have a better fit model and improves predictions. Now, this is popularized in a package called GlimNet in R. A lot of you might have seen this package. You might have pronounced it GLMNet, and I pronounced it that way for years until I got lunch with one of the authors, and he told me it was pronounced GlimNet and a piece of my soul died. But GlimNet. So first up, we have to prepare the data. We have to take steps to prep the data. And we all know that prepping the data takes 80% of the effort, and the other 20% of our efforts spent complaining about prepping the data. So let's look at some new functionality from a package called Recipes. I'm sure a lot of you have heard of the Carrot package, which was the unified interface in R for doing machine learning. Been around for well over a decade, long before Psykit learned, actually. And the author of that, Max Kuhn, has decided to take all that hard work he built in Carrot and redo it for the tidyverse. The tidyverse, if anyone doesn't know, is this movement created by Hadley. Well, I should say it's not created by him, a movement championed by Hadley Wickham to get people using R in a consistent, clean way. And there's a whole set of packages called tidy models, which are meant to do machine learning in a tidy way. This uses the principles put forth by Hadley Wickham. And instead of having one monolithic package like Carrot, Max has split up his work into many smaller packages. The pre-processing package, otherwise known as the feature engineering package or the data transformations, is recipes, like out of a cookbook. So you can see, first, we start with a recipe. And our data, by the way, is about credit data. Our outcome variable is if you've defaulted or have not defaulted on data. And our input variables are information about your job, the type of loan you want, all this other information that can define if you have a good credit. And our job is to predict whether someone has credit. So we start the recipe with saying, credit is our outcome variable, our response variable, our why variable, our target, our label. See all the many different words we could use to say this? Till date dot, meaning for this data set, I'm going to regress our credit outcome variable against every other variable in the data set. It's a little shortcut, a little cheater. I think this way I don't have to type it all in by hand. And the data comes from an appropriately named data set called train. Hopefully your data sets are better named, but this is to make the presentation as generic as possible. And recipes introduces a series of steps. And these are steps that you will take on the data. It's not being taken right now. You're defining steps to be taken, such as step other on all the nominal variables. This is saying if you have categorical variables with many different possible unique values, the ones that occur less frequently, bin them together in an other level. Step ZV, remove any variables of no variance. If a variable doesn't have variance, it's useless in fitting a model, so get rid of it. Step center and step scale, subtract the mean, divide by the standard deviation. This puts all the variables on the same level playing field. I have heard since then there's a new function called step standardize, which I would use instead, but that didn't exist until like, I didn't know about it until two weeks ago, let's say that. Step dummy, this turns categorical variables into one zero indicator variables. Step intercept, make sure there's an intercept in the model because not all models need an intercept. So after you've defined all of your steps, you prep your recipe. Prepping does the calculations. It computes the means. It computes the standard deviations. It figures out what dummy variables are needed. It does all the calculations. And then after you've defined your recipe and you've prepped it, you then juice it to do the processing. There's another function called bake. So yes, you are using functions called prep, juice, and bake. There's plenty of room for whimsy. So now that we have all of our data defined, all of our preprocessing done, and by the way, there's about 30 different steps you can do. And you can define your own steps. You want to take a box cox transformation. You want to take a log transformation. They're available using these preprocessing steps. I'm just showing you a few of them. So now that our data is ready, let's define the model. Now, we could use GlimNet directly, but let's be honest, GlimNet has a really ugly interface. It's not exactly an attractive thing you want to use that often. And it's not much better of other things like XGBoost is probably an uglier interface. So we are going to use a new package called parsnip. Yes, the person who brought us carrot now wrote a new package called parsnip. And you define the model with two functions. The first function says we are doing a logistic regression, logistic reg. That is saying we are going to do a regression model, a GLM, where the outcome is a 1, 0 binary outcome. Now, there are many ways to do a logistic regression. I'm not even talking about trees or random forests. I'm talking about a logistic regression. There's GLM and baseR. There's GlimNet. XGBoost itself can actually do logistic regression. There are multiple other functions. So logistic reg just says we're doing the logistic regression. And then we set the engine. Setting the engine says, which R package do I want to do to actually do the computations? Because you want choice. You might want to use GLM. You might want to use GlimNet. You have many different options. So after you have defined the model, and that's all it takes, by the way, say what type of model you want to do, say the engine, you then fit the model. We use the fit function. We specify, hey, credits still are outcome variable. We are fitting it against all the other variables in our prep data. Notice it says train prep, and it goes and fits the model. That's all there is to it. We're using the defaults to keep it simple. This is a machine learning model, and this will be repeatable for many other ways. And notice how we make use of the pipes. If you're not familiar with pipes in R, that symbol percent angle percent, this allows you to flow one function into the next into the next, like in bash. It's a really great way of writing code if you haven't done that. So you go ahead, you fit your model, and since we did GlimNet, you get something like this. Everyone could easily interpret that, right? I don't need to explain, I'm just gonna move on. Okay, each line represents a different coefficient. If you wanna call them weights, fine, call them weights. I'm gonna call them coefficients. Now it's not per variable, that's per coefficient, because some variables of their categorical have multiple coefficients. So the y-axis is the value of your coefficients based on where you are in the x-axis. The x-axis is log lambda. If you recall our equation that we're trying to minimize, there's a lambda term, that's a hyperparameter for how much penalty you're going to apply. What GlimNet does by default, it doesn't fit one model, it fits roughly 100 models, each one of a different value of penalty. And then, for a given value of penalty, you can see your coefficients. So on the left of the graph is very little penalty, and it's shown that on the log scale, because they jump in orders of magnitude. But on the far left of the graph is a small penalty. Now if you had a penalty of zero, it would be regular GLM full blown model. If you went to the far right of the scale and had infinite penalty, you'd be left with a null model, just an intercept. You most likely don't wanna do zero and you don't want to do infinity. You wanna do somewhere in between. And depending how big of a penalty you have, it defines how big your coefficients are, because the elastic net is a biased estimator that shrinks coefficients towards zero, making Bayesian's happy. Not really, Bayesians don't like the Bayesian lasso, they prefer a horseshoe prior. Let's skip that conversation for now. So as we increase our penalty going to the right, the coefficients get shrunk down to zero and some of them become completely zero. That means you're doing variable selection. The coefficient for variable is zero. That variable doesn't matter for the model. You performed automated variable selection in a way that is much safer than step-wise and more computationally efficient than step-wise. So you are free to choose a value of lambda and draw a vertical line and use that model. What value of lambda should you choose? Which is the best lambda? And notice I said best, not right. You don't wanna say it's right. You can say which one works the best. So what you could do is you could pick a value of lambda and let's choose one that's zero, zero, zero, one, seven. And this time let's look at a coefficient plot. There's a package called coaplot which has any same named function called coaplot. You give it a model and it works of LM, GLM, it works of a few other types of models. And I'll show you the coefficients for a specific value of lambda in this case. There are a lot of variables in here. It looks like it didn't do very much selection. So we could be overfitting the model with this, we could be overfitting the data with this complex model. In fact, there's so many variables in there it's hard to read, it's over plotting a little bit. So let's try another value of lambda. Let's try a bigger penalty which gives us fewer variables. It's still a somewhat complex model but it's less complex than before. So let's use an even bigger value of lambda which is a bigger penalty which gives us an even smaller model. So depending on the value of lambda, the amount of penalty, you get a drastically different model. Before we were overfitting, we could be underfitting now and we wanna get it just right. And that's all controlled by lambda which is just a hyper parameter. And the different parameter values can have a big impact on the model. So let's change that and look how it happens. We return to the code for parsnip and we see here that our equation is exactly the same. Sorry, our code is exactly the same. We have simply changed one line. We've added a parameter called mixture and we set it equal to zero. We didn't really talk about this but when you have the elastic net, you have a little bit of lasso, you have a little bit of ridge, how much of each of them? By default, it does 100% lasso and zero ridge. By changing this one hyper parameter, we have flipped it. It's no more lasso, it's all ridge. And when you change it just from the hyper parameter of mixture from one to zero, it drastically changes your coefficient path as visualized here. Notice the very different shape. Before the coefficient curves were diving towards zero after a certain threshold. Now they're asymptotically going towards zero and all we've done is changed one hyper parameter. That's all we've done. So machine learning is really just brute forcing your way through hyper parameters. You're just tuning knobs, trying to find a sweet spot. Not too much, not too little. It's really amazing that that's really what machine learning boils down to today. It's doing the same thing again and again and again and trying different hyper parameters and seeing how it changes things. Of course, you can get fancy and build a machine learning model on your hyper parameters to see which one's best. You could also treat them as an optimization problem. Let's not spend our time doing hyper parameter tuning here. That's what massively parallel computer systems are for or grad students. It's always good to have a grad student to do all that stuff. So let's talk about the different engines we had available to us in Parsnip to fit penalized logistic regression because we could fit the same model in many different ways. If we want to use un-penalized regression, we could use GLM and all we have to do is change the one line of code where we set the engine to be GLM. Now, in this case, there's no penalization but if you have a simple model, you might not need penalization. So we could go with GLM. If there are any Bayesians in the room, we can use Stan to fit the model. All we have changed here is the engine, yet again, because Parsnip does all the work. Now, if anyone doesn't know, Stan is a language put out by actually my mentor, Andrew Gelman, to do Bayesian MCMC. It's meant to replace bugs and jags and is a great implementation. And we could have written the Stan code ourselves. I've done it plenty of times. Or we could have used the package rstanarm and learn their interface, which is very similar to GLM and LME4. Or we could just use Parsnip and change one line of code and fit a powerful Stan model. Now, by default, Parsnip can't do penalization on Stan but I put in a feature request two weeks ago for them to add the horseshoe prior. Hopefully they'll implement that soon. But let's say you're not a Bayesian. Let's say you're a big data person and you use Spark and you wanna use Spark MLlib. Well, you change that one line of code, the engine, to Spark and now you can use Spark MLlib, which does penalization to fit your model. Now, this is great if your data is already in Spark. Why move it out? Use Spark to do it. You didn't need to learn any Spark. You didn't need to learn Sparkly R. You didn't need to learn that language that's based on Java that had to use for Spark. You don't have to learn any of this. Now let's say you wanna be cutting edge because you want some attention. You could fit a logistic regression model using Keras. Right here, set the engine to Keras and you just did deep learning. Who says it can't be simple? Now let's say you didn't wanna make it simple on yourself and you wanted to fit a deep, a deep, deep logistic regression. Well, let's use some Keras code, right? Which one would you rather write? This code of the previous slide. And here we're doing, I believe, three layers. We start with our sequential Keras model. We pipe it into a dense layer and we give it a ReLU activation and then we pipe that into a batch normalization to prevent overfitting. We do dropout to prevent overfitting. We do another dense layer, another batch normalization, another dropout, then another dense, another batch, another dropout. So finally we get to our bottom layer where it's just the outcome. And look, it uses a sigmoid. Not an inverse logit because that's not a thing in deep learning, it's a sigmoid. So that sigmoid brings us back again to the same thing with different words. It's all those things whether a Spark, Stan, Keras, GlimNet, GLM, we're all doing the same thing with just different implementations. So let's see something different now but still the same idea. Yep, let's talk about decision trees. Decision trees are, what we're gonna do now is gonna be essentially a non-linear logistic regression. It's gonna capture more complicated relationships. So we will still use parsnip to do this rather than learning the interface for XGBoost or for light GBM or for CatBoost or for GBM or for anything. We're just going to use parsnip. And all we had to change was the linear reg, sorry, logistic reg to decision tree and we'll tell it we're doing classification and we set the engine. And we're using our part which has been built into R for I don't know how many decades. And it stands for recursive partitioning. That's all decision tree actually is. You're recursively partitioning your input space. You fit this automatically with parsnip and you get back this beautiful visualization using our part dot plot. The nice thing about decision trees is that if they're simple, they're understandable. You're asking a series of yes or no questions and branching off different directions depending on the questions. You're essentially creating a bunch of buckets. If it could be understandable, it gives good predictions but the problem is it has high variance. So let's try to solve that with more of the same stuff. Let's see more trees. We could do C5 which is a different engine for fitting regression trees and classification trees. And we're more than happy to do that. Or if your data is already in Spark, hey, we could use Spark to do this. Why don't we just use that instead? Look how simple this is of parsnip to fit your same model with different tools which can result in very different outcomes. Now we still have this issue of high variance and we used to solve that with random forests. But they're old hat now, right? We wanna do boosting. The machine learns each time it fits a model repeatedly and gets better each time. Sort of like doomsday fighting Superman. Each time it got defeated, it got better. So let's go ahead and fit a boosted model using our favorite new tool, parsnip. We change the type of model we're fitting to boost tree and we set the engine to XGBoost. We so simple. All we do is change two lines. When you fit a boosted tree, you lose a lot of your explainability. But luckily, we can still get a variable importance plot. That is, we can see how important a variable was to defining the model. It doesn't tell you which direction it was important. You don't know if it's a negative impact, a positive impact, or how great of an impact. It just tells you how important it was. So let's see, other ways to boost because that's the theme. We can use C5 to do boosting. You don't know C5 is one of the early tree and boosting algorithms that came out decades ago and it's been in R for a very long time. Or once again, if you're into big data, whatever that means, you could use Spark and just change the engine to Spark from XGBoost from C5. One little line of code and everything completely changes. How nice, awesome is that? So there are many ways to accomplish our goals. In this case, classification algorithms. Fortunately, we had that great uniform interface. So to wrap things up, let's remember a few things. That AI code can be complex or simple. And that sometimes, often, simpler is better. It might not overfit as much. It's more explainable, it computes faster. So don't necessarily go for the complex thing, sometimes go for the simple thing. And the key takeaway, I really want everyone to get this today, is same thing, different words. You have to cut through all the inconsistent usage to be able to do your job. So this way, the next time you're at a party, so what do you do? You're gonna have plenty of answers, so thank you. Thanks, Jared. We have time for a few questions. Hey, hi. Hi, thanks, Doc. Thanks. So a few questions on our, and a few questions on the mic. Speaking to the mic. Yep. So one general question is, so we say that same thing, different words. So you gave an example that we are just fitting a line. But the difference is that if you're just using the cost function, like that is just, you're viewing just that sample. If you're using MLE, you're viewing the entire population. So I don't think that, so implementation details are important, even if you're doing the same thing, like the end objective is the same, and that matters. So yeah, one thing on that. So I don't think that this thing that it's all the same, really helps the people who are entering the field, they see it like the same way. Just one thoughts on that. Other questions are that are specifically to us. So for example, the few things, the problems it faces that, so tidyverse universe, whenever you are writing the code and you productionize it. So the problem with that, like few functions are repeated. Let's say butate, summarize and all that. And while productionizing, I'm not sure which library I'm using. So I have to append the library name. Is there a solution to that? Like you do that. And second thing is in general that when we are prototyping the hard code, the variable names, but when I'm productionizing it is interacting with other systems, I have to use the variables. And that thing is solved in tidyverse, but that is not very tidy. And I don't think it's hard, so any solution there. So there's a few questions. The first one is how do you disambiguate functions? And the best way, recommended way, is to prepend the package names, like dplyr, colon, colon, mutate. Every time. It's ugly, but that's proper software engineering. There are other ways you can get around it by loading the packages in the right order, but it's much cleaner, more explicit to do package, colon, colon function. It's safer. Your second question about the, oh my God, I had to answer it. Could you remind me of the second question? Yeah, so second question was that when you are not, when I don't want to hard code the... Oh yes, yes, how to deal with variables. There's now, they put a lot of work into tidy eval, which now use two curly braces around it, and it can inject the equation name, either as a string or as a column name. So it's brand new, a few months old, at the most, maybe a month old. Tidy eval uses double curly braces, no more bang bang, no more triple bang, double curly braces. It's fairly new, and it's awesome. Thanks, Sher. Any other questions? Yeah. Hello, sir. So I'm a student right now, and one thing that you said stuck my head, that hyperparameter tuning is basically for the grad students. So is it really the case or it was just a joke? So, yes and no, I had one of my students intern for me, and we wanted to fit a penalized multi-level model, and there's no hyperparameter tuning. So I didn't have him do the hyperparameter tuning, but I had him write a new R package that did the hyperparameter tuning for me, and he published it on CRAN. So another follow-up question that, suppose if I do not want to be stuck with the hyperparameter tuning, so what are the specific skills that you might see in any intern or any grad student that you might feel that, oh, he's not physically for this task, but he can do something better than that? Like methods to use to do it? Yeah, so the direct implementation on the model, not the pre-processing or the pre-hyperparameter tuning stuff, so what are those skills that's required that I need to work on? For hyperparameter tuning, we're fitting the model. No, for fitting the model, we're doing the actual work. Understanding exploratory data analysis, understanding the relationships in your model, understanding the math and why the math of certain models works better than others, understanding when you'd want to do penalization versus when you want MCMC. So I guess learn Matrix Algebra and really learn the math behind the model to know when to use it and why you're using it is the best thing I would do for that. So it depends on the type of model you're fitting. You mentioned Lyme. Yeah, so it's very difficult to interpret XGBoost due to the black box nature of it. And not even that it's black box, it's when you start stacking things, humans can't understand it. There is Lyme, as you mentioned. The XGBoost people put out something called, I don't get the name right, but XGBoost explainer or something along those lines. But my other answer will be is go Bayesian, right? If you do Bayesian, you get confidence, not just confidence intervals, you actually get confidence distributions. It's gonna take a lot more computational horsepower, but if explainability is that important, go Bayesian. And it's really cool. Even if you're fitting TensorFlow, there's a fairly new, not that new anymore, TF probability, which allows you to do Bayesian inference on TensorFlow models. And that doesn't give you inference on the weights, it gives you inference on your prediction, not sorry, intervals on your predictions, but it's better than no intervals. But people are making a lot of good effort right now to do this explainability, but Lyme, XGBoost explainer and Bayesian methods are gonna be the best go-to right now. Thanks. We have one last question. Yeah. So different ways of doing the same thing. And so I know the answer to this has to be subjective, but you mentioned at the beginning that R is your preferred programming language. Now, just to give my own personal journey, five years back when I started getting into data science, I shouldn't be saying this, but I first learned SAS and I've forgotten SAS. Fair. And I got into R. And now Python seems to be the flavor of the season. So what would your advice be? Should one stick to one particular language? Because, or you know. So it depends on your personal style, like people here in this room probably speak a lot more languages than I speak, right? So being multilingual is never bad. Now speaking is a little bit different. It's easy to speak many languages. Well, I think easy for most of the world, not for Americans, but for everyone else, it's pretty easy, right? Programming languages, are you the type of person that can handle knowing multiple languages at the same time? Or the type of person you really want to focus and go deep? I personally focus and go deep, that's because my real skills are my math skills and my statistics and my machine learning. And yeah, I'm really good at R, but it's really because of my statistics. I have employees that they're not as strong in statistics, but they're good at doing multiple languages. So it really depends what is your skills you care more about, the math or the programming. It's programming by all means learn multiple languages. But like you said, you mentioned is the flavor of the season. So to keep in mind too, there are fads that you don't want to make sure you get tied too much into anything. Thank you, Jared. Let's give a big hand to Jared. Thank you.