 Hello everyone. So my name is Yiming and I currently working for Circus University. I'm working as a director of research for normal management. And for my work, for my current work, I perform various data science projects that focus on the predictive modeling of the pipeline of the students enrollment or recruitment enrollment and retention pipeline. And so I think my the focus of my sharing is about my formal life, formal professional life. So I used to work in pharmaceutical industry and also marketing for seven years. So I worked in different data and department, including marketing entities and before that by informatics as well as by statistics. That's a quick self induction. Norma, do you mind? Can you hear me? Yes. Can you hear me? Yes. Wonderful. Good afternoon. My name is Norma Palominograb. I am a former doctoral student, an alumni from Circus University from the high school. I graduated from the doctorate in professional studies and I did my dissertation in natural language processing with Nancy McCracken. Currently, I am one of the analytics and artificial intelligence directors in a company here in the Midwest called Cummins Our company, we manufacture engines, diesel engines and power systems. We're a global company, a Fortune 120 company that have different branches globally. In my current role, I am part of an office that is called Chief Digital Office. You may have heard that now it's a new type of organizational unit that some companies have. They have divided the digital products part of the work from the chief information office, the information technology systems part of the work. So I am in the CEO of or chief digital office and there what we do, we in partnership with business owners and such a subject matter experts, we identify outstanding problems that traditional analytics techniques couldn't solve and we use different artificial intelligence and advanced analytics techniques to solve those problems. Typically, manufacturing is six sigma. The most traditional technique is six sigma. We go beyond six sigma using algorithms solutions in order to solve these problems. So that's my current role at Cummins. And I have around five teams with different type of data scientists and data engineers, software developers, project managers, etc. It's my pleasure to join today this panel. Thank you for the opportunity. Yeah, and then John, are you still there? And if so, are you able to do the same please? It's kind of a brief overview on your role and then how you're using data science currently. Can you hear me now? Yes. Hey, so my name, my John Fox, I work for a small retail company. It is in the Fortune 150. I think we're still in the Fortune two. So I currently in our data science office down in Dallas, Texas. I did my PhD with Don Brown at the University of Virginia, focused on spatial temporal event prediction, 20 plus years in the military. I want to do something different. Sue Corriera would not let me teach at the Syracuse University on campus. So I had to move to Northwest Arkansas and try to be a data scientist with Walmart, and I'm currently with Sam's Club. All right, great. Thank you. So question that came in directly to me to all of you, and it's been asked by several of our speakers. What do you think are the most important skills that a new data scientist should have? And if they don't have it already, they should look to acquire? Well, I guess I can trip in here. I think in terms of the skill set, the data science have its foundation in, on the technical side, I think have its foundation in the statistics, as well as the computer science. I think some of the speakers earlier today also mentioned about the general area of computer science that focus on the machine learning, and then this one subset of the machine learning called deep learning. And then by extension, there are also some technologies about some of the more fundamental topics such as the algebra calculus. And so, and obviously database is also important skill set. I think that those topics have been well built into the curriculum. So that's a great thing. In addition to those skillset, I think the ability to be able to communicate really well and also to be able to do work with domain experts, I think that is also a key factor for data scientists professional life. So a question that just came in on the chat. What are the technologies related to data science coding languages or software packages? And ask which one is better, but I guess are there specific ones that you guys recommend that you're using more in terms of coding languages or software packages? Yeah, well, I guess I don't know if anyone wants to jump in here, but I guess I can start the discussion on that. In terms of the technologies related to data science, coding language and software package, I see John mentioned Python or C. So I think that's a good starting point. Essentially, in terms of the program language, so program language is a tool. So it's used to achieve a goal. So I think that in terms of the choice of the program language, it will have a lot to do with the ecosystem of the working environment, like what is the choice of the program manager for the team, and what are the code repositories, like what kinds of languages have been used in the previous code repository that be adopted by the team. So I mean, for these two languages, we can compare between R and Python. So they have a lot of similarities. Both of them are dynamic interpreted languages. So they have extremely high level. And in terms of the learning curve, it's relatively quick. And C up like R is probably more it was developed by the statistician and primary for the statistician at the beginning. But obviously, it involves now it provides support for a lot of the data mining and machine learning tasks. And Python is like started as more of a general purpose programming language. So it later on provides the support for the data science, and it's catching up really fast over the last 10 years in the field of data science. So you have those packages, libraries, such as NumPy, such as pandas, secular, Keras, so that support the full spectrum of machine learning, deep learning. So that's why you are going to see a lot of the teams are using Python as well, particularly if you want to do the product management, like they want to push out the full product development. So they use Python because they can integrate better with for example, the web framework. So in terms of the technologies, I think I'm not sure exactly what kinds of technology are we talking about here. So maybe if it's the application of the data science, like technologies that are using AI, using data science, I would be thinking about one of those technologies nowadays that focus on natural language processing as well as the computer vision. So I think the autonomous AI is the next big way and that integrates a lot of the computer vision technologies as well as the natural language processing technologies. Thanks, Ying. And Norma, do you want to go ahead and see your hands up? Okay. Yes. Sorry. I am muted. I cannot mute myself. So I was trying to make a comment with the other question and I couldn't. So it'll be great if the speakers can stay unmuted so we can participate better. So thank you so much for unmuting me. I want to comment on the two questions. The first one I really agree with Ying on the communication piece is essential for a data scientist to be able to communicate to the business in business terms. We will learn all the technical jargon that is essential for us to work on the profession and advance the career, our careers, but when you interact with the business talking in terms of accuracy or false positive, false negative, you need to explain what that is and try to lower the level of jargon as much as possible. Otherwise it doesn't become, our work doesn't add the value that I should add. So I will say communication skills even as an instructor at Syracuse, I put a lot of emphasis on my students too when they write papers to discuss absolutely everything not to make any assumptions that the reader understands how the technology works. And also in terms of the business cases, because when you finish your algorithm and you show your accuracy, that's the beginning of a conversation. The business wants to know why is that useful at all is that 80% accuracy is not the goal. The goal is to solve a problem. So you need to learn how to solve a problem in business terms and that is a learning curve. Talking about performance, downtime, uptime, bottlenecks in the case of manufacturing for me was a learning curve. So this is, I wanted to make a comment about communication skills. And I guess it depends on where you want to work on, where you want to develop your career, if you want to work on your in the academia, you don't need to do this. But if you want to work in a corporation, you must talk in business terms. Regarding platforms, we do see an advance in platforms that are called local, no code, which are these platforms that require very minimum programming skills. In that sense, that is changing the way companies hire data scientists because they need, as you said, all the deep knowledge on statistics and algebra calculus, they need that more than superficial knowledge and more programming, because the new platforms, they have the programming. So I will say for the future, the impact these platforms have in our profession is that we will need to go very deep and proficient in statistics or know very well how to talk to the business because nothing in between actually is going to work. Thank you. Thank you for the time and for the question. So there was a question posed in the chat and it was actually the topic of one of the discussions earlier today and it was, is ethics a part of your job? So how does ethics and data science play into your job? Were you able to hear the whole question? Hey, sir, I can go. So for us, anything that touches our members and our customers where we're using the algorithm goes through our, we've got basically two boards that deal with both the algorithm and the data. So how do we do basically A, B testing on the fairness and the algorithm is an open research question. It is something that we focus on intensively for anything that touches our members or our associates. If it's forecasting sales, if it's forecasting or trying to identify replacement products where the member input or the associate input is not there, then it doesn't go through the same rigorous process. The second side that we have is, I don't know that it's in the data science realm, but it's definitely highly correlated, is the data governance. So the components of data that are related to any member privacy, whether it might be under CCPA, the California Consumer Protection Act, or any similar acts that protect member customer data, the data science teams have to work hand in hand with that data governance element just to ensure we're being fair and just to ensure that we're not using the data in a way that would cause us to lose the trust of our customer. But that's probably as far as I can go on a non-proprietary call, but interested to hear from Dr. Grubber, Dr. Lin. I agree with what you said, John. I think maybe we need, we don't need to understand that the ethical concerns are not related to the nature of AI. It's the way it's used, like as a tool, right? It all depends how we train machine learning models, the outputs that we will get from those models. So if the data scientist has a bias, the algorithm will show that bias because the bias is reflected in the way the data was selected, in the way the features were weighted. So it's just, I think it's very important to have an ethical discussion, but in terms of how these techniques as a tool are being used, not to blame AI for being an ethical, so to speak. It's kind of, I'm not saying that we are doing it now, I'm saying you hear a lot of that kind of, those types of associations in the common knowledge that people have, that you know all these AI is coming to do harm to all of us. I mean, it's actually, it's just a reflection of human activity. And as such, a bias data scientist will have a skewed perception of how the outcome of the algorithm will be, and the technical decisions he or she makes to develop that algorithm will just reflect their own biases. Yeah, I guess I can trip in with, I think it's important to building this concern for ethics at every stage of any data science project in the real world. So I think I can highlight it, I can highlight two examples. So one example focus on the data collection stage, another example focus on the model performance deviation stage. So in terms of the data collection, data population, so a bit can say it's given to the embedded data sets where we have maybe underrepresented group in the data. So if we are doing a typical, let's say, randomized sampling in order to find the study population, so it runs the high risk of excluding certain type of underrepresented group. And so that kinds of data characteristics, inherent data characteristics need to be accounted for by the data science techniques to ensure that the project will be, will be, and the conclusion will be fairly represented. Another one is about performance evaluation. So a big general approach for machine learning is particularly for deep learning is to come up with what we call objective function. Sometimes a social refers to as the last function or essentially it's the objective function that we try to optimize on. And the access of a lot of the machine learning algorithms including deep learning is about optimization problem. So it's crucial to set up a bright, well thought out objective function instead of just do once like simple dimension metric that focus on certain type of, certain type of, certain type of outcome. So we need to build in the consideration of multiple outcomes into this, into this framework so that we can be sure that's the type of the project we're doing is fair and is, has have the ethical concern behind it. All right, thank you all. Another question, what are key challenges and pitfalls related to data science machine learning that we widely depend on today? What are your thoughts on data science versus fundamental exploration and discovery? Well, just to start, I will say the challenges that we find every day at work in data science project is data, is having enough data, having enough data that represents the type of phenomena we want to analyze or predict or explore and data in good quality. As a data scientist, you will know very soon that you spend most of your time just working on data, cleaning the data, gathering the data, cleaning the data, making it ready. It's the most critical aspect of data science. Any sophisticated, let's say deep learning, neural-level algorithm is just a whole bunch of statistics put together in a system and run in real time thanks to the superpower of computer machine right now. So it's a whole bunch of statistics. And as we say, statistics garbage in, garbage out. So the most critical aspect of a data science project is having good quality data. And I will say that's the most important conversation we have very often and the most critical factor for us to be successful every day at work in our data science projects. And Jeff, I missed part of the question. Would you mind just repeating the question again? Yeah, let me scroll back up and find it. Okay, what are the key challenges and pitfalls related to data science machine learning that we widely depend on today? And what are your thoughts on data science versus fundamental exploration and discovery? Gotcha, thank you. So what kinds of the Pete for? Yeah, John, if you want to go first, I guess I need to. Yeah, I think I agree with Dr. Can you hear me? Yes, I think I agree with Dr. Grubb that I don't know that there's a really great separation tonight. I shared Leo Breeman's article, statistical culture or statistical modeling two cultures and I would encourage all the students or other faculty if they haven't had a chance to see that just sort of read that article. You know, I think corporate America likes to draw a big line between, you know, hardcore data science and a data analyst. But there's definitely, you know, there's definitely a transition point, there's definitely an evolution, but we all start with getting the data scrubbing the data, exploring the data. Even if it's a huge large data set, we're going to take a sample of it to look at it. I also agree with with Dr. Grubbin, the pitfall is if you're a data scientist and you've got a bias about the data, you can find it, right? And you can build an algorithm that just reinforces that. And so the other pitfall, I don't know if it's a data science pitfall, or this is purely my opinion, right? Purely my opinion. I don't know if it's a data science pitfall as much as it's a corporate pitfall. And that's, you know, every once in a while you get a leader that walks in and says, you know, hey, could could I get an order of neural networks with the large vanilla data science shake? And can you get me some of that artificial intelligence as a side, right? And we want that to answer our problem. And so you spend a lot of money, you know, to go build a quantum computer when really all you need is heaven forbid a pie chart. So I think there's a pitfall that we use the buzzwords and we use the investment, Vince, that one phrase was for you. We use a bunch of investment where we might be able to solve a problem with a little bit more simplistic approach. And I will stand by for Dr. Lynn to rebut me and show me the air of my ways. Yeah, well, I guess I can echo both Dr. Fox and Dr. Cropp's points. I think there's four people in data science. When they think about the machine learning way, think about the AI. So a lot of the thoughts, a lot of the priority were given to the data modeling parts, if we can approach these data science projects using these cross industry standard process of data mining. So starting with a good business understanding, and then move on to data understanding. And there are a lot of things going to that step, which I feel is probably the step that is most underappreciated. So things like data pre-processing, data population, exploratory data analysis, and a huge part of that EDA is the data visualization. I think that step account for a majority part of the time and efforts in the real world data project. But that step has not been given a lot of the spotlight, but it's pretty much determined to a great extent of the success of the data science project. So I feel like for people who want to get into the fields, they probably want to put in more emphasis on the step, and as well as the step where we need to compare the performance and fine tune the model. Of course, everyone knows that the importance of modeling part, and then the importance of understanding and ability to carry out different statistical learning, machine learning, deep learning models. So I guess that part does not need for the emphasis. Sometimes people will tend to use over complicated model on a problem set that is not a good fit for the algorithm. So I'm specifically thinking about, so when we have a data that is not particularly large and with some kinds of the data quality issues such as the outliers, maybe in that case, despite all the hype, deep learning is not a good choice given the complexity of the model, given the number of the parameters, and then the difficulty of fine tuning the hyperparameters. So maybe another type of we can see the shallow learning come out to be a better fit. I'm thinking about examples such as ensemble learning, like gradient boosting machine. So I guess eventually it really has to depend on the situation about the data, about the problem sets that we are trying to handle instead of just to think of the most advanced topic or most advanced algorithm that is popular, that is in the news right now and use it. I mean not to say that deep learning is not hugely important too because it is. It's particularly good for the perception of AI, perception type of the tasks such as computer vision and text data, natural language data, but for the more typical traditional record data sets like in the business world, probably most of the time we write into the kinds of the record structure data. And then if the size is not that big, so there are some other choices that can give you the best outputs. That was great. Thank you all. Another one that came in, what are the advantages and disadvantages of being an adjunct professor specifically? Is it difficult to balance teaching and then your day job responsibilities? I think in my case they tap into each other so nicely because at work I'm always talking about solving problems in data science and with the students it's the same situation right. So just gradient assignment is very close to giving feedback about a particular data science and monitor together how the new models are being developed, how new tools are being released and test them out and bring lessons learned from one place to the other. So things that are learned at work I can bring the classroom and share with everybody and the students always bring fresh new ideas that I can also use back in my everyday practice. So I think they really tap into each other so nicely. Now of course when I have to grade it takes some of my weekends time, etc. But that's what I will say is the only time that I think you know it's a crunch time. It's also the amount of classes, right? I always teach just one course. More than one course will undermine the quality of the teaching if I do that. So I think they go together very nicely. Well I agree with that statement and I share the same of the sentiments. I feel like it's a great experience to do the adjunct teaching for a prior data science project, a prior data science program. So because this is something that the kinds of the topics that we're teaching as adjunct have a lot of overlapping with our professional lives. So this opportunity of sharing what we have learned and both from the theoretical side and from the applied side and share that with our students aspiring future data scientists or data engineers or data analysts. I think that is a very satisfying experience for me. Yeah maybe I should let the two or three students that are on the call for me answer that question. Maybe I don't want to know the answer to that question but I echo the the opportunity to share back and forth is is is fantastic to share some of the research that our company is doing in partnership with people like Google, Microsoft and Nvidia is a way to spur the students to focus on other things. I think it's helpful. I would be remiss though I mean at least the organization I work with every once in a while the the pace of the organization and the things that are happening in the organization. I think it happens it normally happens like twice a year and I probably just shouldn't teach during those times. For those of you that might have remembered March of 2020 the supply chain went a little out of whack in markets in 2020 and it continues to today and and that that that makes finding that balance Dr. Grubb between grading and teaching and doing your day job. You know I experienced it last night I'll probably experience it on the Wednesday before Thanksgiving again so thank you. This is for all of you so are any of your companies creating quantum computer labs and if so how do you see that impacting your role as data scientists? Not for me I guess I would leave these questions for Dr. Clarkson. Not in my case either. We are creating an innovation center where the the team plans to start exploring those but not yet. This is a very good question. It's a very good question. You got me thinking for after the call. Thank you for the question. Thanks Vince for the question we appreciate it. Yeah I think I think the investment in that space for for a non-tech company comes in partnerships with those tech companies and there's even not specifically with our company but the national research labs Sandia, Oak Ridge, Livermore they actually have partnerships with with corporate America to to do research in that space so you know I don't think you're going to see a quantum lab in northwest Arkansas you know over the next couple of weeks but I think you'll continue to see investment from corporate America into academic institutions and you'll see investment across corporate America and with the national labs because research has to be there the investment has to be there but not everybody needs to build their own machine to start with. So another one that came in was somebody who them specifically they were coming from a nursing background and asking if they had a place in data science but I think we could make that question more general and say you know for really anybody coming from you know a non-technical background you know what are your thoughts on the place they have in in the data science world and wondering if their previous experience or I guess lack thereof you know would mesh with making that sort of career change into the data science world. Well I can check that I think at this point with the advancement of technology to gather and store big data and handle big data any profession will be touched by these technologies any profession will have a type of platforms that performs intelligence based on big data so all what we call the subject matter expert with which is who you are you are a nurse so you're an expert in a subject matter that is related to data science is essential for these subject matter experts to become proficient in the technology right now at Cummins what we are doing is to we are taking a strategic move to train our engineers to learn how to build models because we notice that as the different tools to build them to build models become more easy to use required less programming in the the penetration of artificial intelligence is bigger if we teach the subject matter experts how to use the tools instead of bringing more data scientists we leave our data scientists power brain power for the more complex problems but there is a whole bunch of small problems that you don't need a data scientist for you need a subject matter expert so in that sense at these platforms advanced everybody at some point will need to be involved in some level of data science so I think it's just a matter of having the technology close to you and start exploring because you will be affected by this like everybody is so I think the the the relationship is closer than what you may think right now between your profession and data science you just look around if you have a platform that you are using in your job and that's where your data science capability could be yeah I I agree with that I think coming to data science with domain expertise that's a huge asset I think that would also be be the future of the data science and more general for the future of the AI because I'm thinking it will be AI plus kinds of the model so AI is used as an engine to power a lot of the applications in different domains so if you have a background in being a nurse in the past so you work in the healthcare industry and healthcare industry definitely is going to be a huge area for the application of the data science and AI it has already been in the past but I'm just thinking about it's going to be the next race where the next breakthrough technology come from given the importance of the of of the industry so far I think the AI data science have been used a lot in a lot of other industries such as particularly in technology but healthcare be given its impact on every body's life quality so I think it's going to be a hugely important area and there are so many problems that can be tackled using data science and AI approach I used to as I mentioned I used to work for pharmaceutical and so I work in multiple data in the industry with departments that kind of covered the different life cycle of a drug because I studied in the drug discovery and I move on to drug development so that's essentially the pre-clinical versus clinical and later on when the drug that I was working on was approved by FDA so I move on to be the advanced and the data manager for the drug which is indicated for a type of the metastatic breast cancer so I can definitely see even though the type of the data that I used to deal with is extremely different like what it was for the drug discovery phase the type of the data we used to work on focus on the unstructured data essentially those kinds of the data that you cannot put nicely into a spreadsheet program such as Excel right so they need to be a kind of machine learning algorithms that are good ways of feature engineering and in addition to be able to deal with the high-dimensional data and the same type of the data characteristics change when we work in different stages of these kinds of the funnel but as you can see at different stages there can be values added to the to this business process eventually is to come out with the new job that can benefit the society so this question is a little longer so if you need me to repeat it please let me know so at smaller companies a data scientist might have to be a jack of all trades type employee but at a larger organization data science teams might have specializations with the within the team so how should you know somebody best find the balance between being a well-rounded candidate and a specialized expert without narrowing or limiting their potential job opportunities wow that is a very good question that is so true and I will say we all face that question in our career because just building a digital product requires so many skills just the algorithm is kind of the core is the heart of the system but then you need the rest of the body right in order for the algorithm you need an interface you need data pipelines you need all the software development to present it to the user or to to put it in some type of application and it's an overwhelming decision to make say okay where do I go do I become a jack of all trades which means that you're not deep on anything or do I get so deep on something that then I have only one type of job that I can apply I will say I'm gonna be a little bit it'll sound like a known piece of advice but it's still real just follow your dreams what do you want to do because you don't want to be a jack of all trades that is unhappy and you don't want to be you know data scientist specialized only in a particular type of hospital data such as vision inspection or whatever and because the market is hot on that and then you're unhappy too so I discovered throughout my career that just focusing 100% of what you want to do will bring the dream job that you're looking for not just trying to have the most wanted position and I understand the inside because it's a very important decision but in my experience just trying to find what you really want to do and even if it looks like nobody else is that is doing it you will find that opportunity that will be my message sorry it's not a secret sauce or anything but it is true that for example kumis we have they my teams are agile and they have a data data scientist they have data engineers they have software developers they have testers and you want to specialize in that little thing but I worked in other places in which I myself was a jack of all trades trying to deliver to the business from every single aspect it's a tough decision and again just follow your dreams well I think it I certainly agree with the parts from Dr. Rob about following the dream and I think it's it's very important to to make the decision based on your interests and your backgrounds and you also have to do with what kinds of environment you want to join what kinds of company you want to join so if you join some of the smaller companies where maybe it's for the whole data science group it's only a few members so I think it's imperative that you will know pretty much all of the things in data science and so not just be able to get the data and analyze it and then come up with presentation for slides but you also need to know how to collect data how to do data preprocessing and how importantly how to do the model deploy deployment so that your end users can benefit from the product that you develop and if you join a more established company and you have hundreds of the people working in data science function such as google such as facebook and so they are greater labor division within the department and so there are job descriptions for different kinds of the roles right so it will be it might be data engineer or it might be machine learning engineer that focus on the algorithm optimization part or it might be data scientist or it might be the product manager and they all require certain set of the experience and expertise but those experience can certainly begin days long through project so I would I would think that focus I think the iSchool have this fantastic curriculum for the private data science that build a pretty solid foundation for the graduates for the students graduating from the program so they can go out and do a lot of the things and I think the only thing that will remain change itself right so we don't know essentially what kinds of the job will be there 20 or 30 years from now so with these kinds of the solid foundation then also be flexible also have this mentality of like the long learning so that you will pick up whatever skill set that is required for the next generation of the job or next wave of the jobs even though I don't always could you see what kinds of the job will be 20 or 30 years from now and lastly so we only have about three minutes left I want to be respectful of of all of your time so the last one would be how do you all see the field of data science changing during the next five years well I can start very quickly I will go back again to what Dr. Liu said just all the statistics the kind of the hardcore statistics algebra calculus will increase because all the programming will become more popular a more streamlined for people in different you know for example in in a particular company like I work for the subject matter experts in my case the engineers they will have tools are the disposal that they don't require them to write much python at all so developing just the fact of turning a problem into a statistical model will be the most core stocked kill along with the communication skill to explain all that to the business I will say that's what the future is right now well I I I think for the next five years well probably to put this discussion in more of a historical context I think if for a while data science have been trying to find his identity like 10 years ago there were no fields called data science and and then data science kind of wrong so um so you will have like I my background so my PhD was in computer science and so my work focused on computational learning theory and machine learning algorithms so when I look at the data science when it was still early days of the data science I was thinking oh you call this data science this is something that we have been doing for quite a while and I think those statisticians coming from the statistical background they probably share the same sentiment about the about the data science so I think when we come along we have a different few different community kinds of merge together people from computer science from machine learning from AI background or from statistical background or from mathematics background and so they all come together and now we we kind of identify this core set of the knowledge of skill sets we consider that these kinds of importance and almost necessary for someone who aspires to be a data scientist so moving forward I I think it will be about how we apply the data science in the real life how we add value because this field is more like applied field so the application of the of the data science of the techniques that we then will be hugely important and that's why I'm thinking that maybe looking look at the future AI and data science is going to impact our society in a very fundamental way and it's going to permeate different industries and have great significant impact so so maybe for the next five years a lot of the focus will be in that regard about how data science machine learning AI will impact different work of that and different professions all right well thank you so much to our our panel for joining us yin john and norma thank you so much for taking time out of your very busy day between working and teaching and everything to uh to join us that was really really helpful and valuable so thank you all