 Well, thank you very much for the invitation and good day. Good evening. I know we're all over time zone-wise here. And what I've been working on for some time, and this is joint work with Shreem Krishnamurthy, who's also at Brown, is thinking about how we rethink introductory computer science to better support learning data science. And there's several reasons why I think we should do this, which we'll go over here. So for starters, it's worth thinking about the pressures that are on the computing curriculum within the university context. So first of all, you have the rise of data science. There are many programs and students who want to be learning about data science, and we have to figure out how that fits in the computing curriculum. There are questions about the societal impacts of algorithmic decision-making and what we should be teaching about that. There are concerns about diversity in tech and the tech workforce. And with the strong tech economy, you're bringing many more students from many more backgrounds into our computing courses. So all of these pressures are building up on intro computer science classes. So it raises a question about how we can redesign our introductory courses to support data science, while also thinking about some of these diversity and social impacts issues, which at least in the United States are generating a lot of discussion with regards to curricula. And the proposal that I'm gonna put forth in this talk is we should centralize data in a way that leads us into data structures. And this is going to let us combine some data science work with some conventional computer science work. Where am I coming from with this? So my technical job title is I am a research professor of computer science. I'm also the associate director of the undergraduate program in computer science at my university. So I see this from all sides. I see this as a researcher who studies computing education as a person who lectures large introductory courses in computer science and from an administrator position trying to manage a large undergraduate program. I've been teaching first year computing for 25 years and for the last 10 years my primary research has been in computing education, looking at the intersection of languages and pedagogy and tooling and things like that. I am a functional programmer at heart and at core. I can play other kinds of programmers on TV when necessary but really I tend to think from a functional programming perspective. So if we think about different ways that different linguistic communities approach intro computer science, let's look at what they do and how well they might accommodate data. So if you generally are teaching intro computing from an imperative programming perspective you go heavy on conditionals and loops and assignment statements with some arrays and lists and that's how you get started. If you start with objects well then you're getting into starting with classes and objects then getting into conditionals loops and if you take an algorithmic focus now you go into some of the classic algorithms and sorting maybe some of the basic graph algorithms. Another model which is starting more functionally is where you're emphasizing functions, lists, recursion, higher order functions and these form the early foundation of your computing curriculum. In my case in particular I mostly have taught with the how to design programs curriculum which is based in racket. So that's kind of where some of my thinking on the role of functional programming and education is coming from. And if we step back and ask ourselves how these three approaches might accommodate data or think about data. Well an imperative intro is really designed to emphasize control, not data. The object oriented perspective tends to put data structures before data. And the functional intro ends up emphasizing traversing data structures but not necessarily thinking about data in the data science sense of it. So the question I wanna ask is what if a two dimensional table were the first data structure that we showed students? So here's a sample, small two dimensional table drawn from one of my own lectures. And what you can think about in looking at this table and why this might be an interesting place to start it's rich structured data but it's in a non-threatening format. So I think a lot about how to offer introductory computing education to students who do not have prior experience in programming who maybe are not even majoring in computer science. They might be studying social sciences or something else but they still need to learn about data and computing because of the way this is under learning all of their different career options. You can come up with many authentic tasks when you work with a two dimensional table. And if you want to be raising some of these issues about the impacts of technology they can come up rather naturally. I'll give some examples as we go. And if you have questions like how many tickets got sold with a student discount you're letting students explore problem decomposition which is something we often teach early on in programming but it's in a very concrete format. Students can imagine taking the tables apart or scratching out on them or drawing on them. I've seen students do this in ways that I have not seen them do when we try to work with general data structures like lists. You also can give students something of a process that they can understand for preparing data to work with. We normalize, we look for suspicious data we use visualization, we analyze. It's another good format to expose students to the idea that software development and coding proceed with processes and not just something that we kind of do because we really were good at it necessarily but there are steps that they can follow. In the context of data science I'll point out that this is as much data engineering as it is data science. Notice I haven't really talked much about statistics and regressions and what kinds of analysis methods you use as much as how you think about managing, maintaining your data as a precursor to doing that kind of statistical analysis. And I think this data engineering perspective is really important if we're going to try to bring computer science faculty into embracing data more from the way we start teaching our courses. Now anytime I give this variant of this presentation and I say we should start with tables people invariably say, well, what's the best language for working with tables? And we're back arguing about what programming language to start with when teaching interest students are in Python get a lot of the airtime Julia comes up and as a computing education person what I wanna say is just stop. This isn't how we design curricula. You don't design curricula by picking a language. You design a curricula by laying out what you're trying to teach first. So I wanna start there that if we're thinking about a narrative for introducing students to computing and programming that embraces data what does that narrative look like and what topics does that suggest? So what I'm describing here is a vision that we've been calling data-centric intro to computing. So very much computer science, but with data at its core. In my experience, it helps to start by helping students realize that both information and code have structure. It's not obvious to a student who's never programmed before that a program is not a sequence of characters but rather a structure of functions and expressions and computation. We want students to understand that the role of computation is to transform or summarize data or in the case of tables, data sets. Sometimes we aggregate information across data points. Lists are a natural vehicle for bringing up that notion of aggregation across data points. The fact that your data set attributes might themselves have structure leads you to an introduction of data types. Understanding that data points might have relationships among themselves and not necessarily be independent is a natural segue to talking about trees. Sometimes programs actually end up needing to update their data sets. And this is where things like state and assignment can come into the story. And then when your programs start getting more sophisticated, you have to start thinking more about the efficiency of working with associative data. And this is a good reason to bring in dictionaries or hash tables. So what you see on the screen here is the topic sequence that I use in my data-centric intro course starting very heavily functional. We're functional all the way up until the state portion of it, but tables come much earlier than in a typical functional programming curriculum. Woven throughout this, we try to always draw students back to the data. Every topic we do is motivated by something a student might want to do with a data set. And this motivation is important for working with students who do not see themselves as computing students, but instead students who are trying to prepare in data. There's many points in this curriculum when I can bring up issues about social impacts of computing. We bring up planning, composing plans, decomposing problems from the very start with tables. Notional machines is the term that programming, sorry, that computing education researchers use for semantics. I spend a lot of time teaching through semantics and models of program execution and evolution. In my classes, because I want students to start to understand how languages work. And that again is something we can concretely visualize at the level of tables and repeatedly bring up as we go through this curriculum. So I just wanna give you a couple of more concrete examples about what I mean by some of these orange boxes and what we're trying to teach. When I talk about students appreciating the structure of data and the structure of code, I first teach them how to construct images like flags. Because if we're constructing flags of the world, we're seeing small pits of images like three stripes here. And we see how the image has a structure of stacking and the code has a structure of nesting. We do a lot in the beginning trying to appreciate this relationship between structure of data and structure of code. And this will translate very nicely into tables and some of the other things we're trying to do. When we get to tables, we are going to pull the tables into a programming language either by writing small tables manually or importing them from CSV or a Google Drive. And we learn how to write functions and make small programs that do things like compute new columns on tables, which is what I'm showing at the bottom of this slide here. So a simple lambda-like function gets used to extend this table with a new column. In this table context, I can talk about normalizing data and that's a process of filtering or searching for values that we don't understand, mapping, transforming data in the process of normalization. We can talk about checking whether all the values are expected as part of just confirming that our data makes sense before we process it. We can capture filters of the table to focus on different subsets of our data. All of these operations that we like to teach as higher order functions say on lists extend very naturally into tables and they make sense to students because it's a context that they can envision working in in whatever domain they're coming from. We also get into discussions about how to represent data. One of my favorite lectures is I ask students how we should capture timestamps. Do we use multiple columns for hours and minutes? Do we make them strings? Do we use big numbers? And this gets us to talking about issues like data schemas for time and for names which get into issues of internationalization and having students think about who they are designing software for. This is an example of social impact kind of thinking that I try to bring across. There's a series of articles that I like a lot. Falsehood programmers believe about these long lists that have been compiled by professional programmers of mistakes people make in trying to represent things like names and dates. It's instructive and natural for students who are looking at early stages of data. You can also bring in many practical questions that show data design trade-offs. We look at how to store lists in a CSV file, how to extract lists from a CSV file. A lot of programs of the kind of scale and heft that we might normally do in an intro programming class can come up very naturally within this focus on tables. So now I've given you some examples of what I wanna be able to do. Now we can come back to the question, what language should I be doing this in? And more often than not, people will say we have to do this in Python. Why? Well, because it's the place that students need to end up with if they're gonna do anything practical. And yes, I agree that ending the students someplace practical makes sense. In fact, the course that I teach at the very end comes back to tables and we show them pandas. So Python's a great place to end up for giving students something concrete that they can look on Stack Overflow or get online help with. It works very nicely. But do we have to start there? And that's again, where my pedagogy comes on. We are functionally inclined programmers, assuming if you're at closure. And functional programming underlies tools for processing tabular data, for doing data science work. Hadley Wickham is the developer of RStudio. He writes a lot about teaching and learning data science and the importance of functional programming. So in the curricula that I work on, we start functionally. And we're starting with a language called Pirate that we have developed in-house at Brown University. It has built in support for images and tables like I showed you on those couple of slides. It has a Python-esque syntax. The PY in Pirate is not accidental. We lifted some things from Pirate. We did drop the white space having semantics. That's not very friendly to new programmers. But the other notational forms look rather similar, but it's a functional language with proper data types. It builds testing into function definition. So we really enhance for students this idea of testing and writing illustrative examples as part of how they write functions, habits that we think we should be building from very early on. One of the fun things about Pirate is it's backed by a lot of research that's going on in our Computing Education Research Group at Brown. So we have looked at building techniques like having students check whether their test cases, their test suites are thorough and accurate before they write programs. This is the kind of tool that we can easily build into a research environment. We have studied how to make students write good tests, how to learn how to organize data and how tool support can help in getting them to that point. We've been studying higher order functions and how students learn to use them and learn to develop them. All of this work in Pirate is driven by education research for computer science. And that's something that can't really be said for the developments of things like Python and R. They have good places that they came from. I'm not at all knocking those. But my perspective is I'm trying to introduce people to thinking about programming and data. So I want education to really be driving how I do this. So when we're doing this course, we do the first two months of it functionally in Pirate. We're learning how to write abstract data types. We're learning how to process these data types with recursion and functional programming from a strongly data-driven perspective. At the point that I'm ready to introduce state and updating data, we take that as our opportunity to switch to Python. And there we do state hash tables and wrap it back around to pandas, showing them that all the things they did in Pirate at the beginning, we can also bring over to Python at the end. And then there's follow-on courses that get into things like objects and data structures and algorithms and other more advanced topics. This is the first in a multi-course sequence that we're teaching. So to kind of pull this together, what's the problem I think we're trying to deal with here? At least in the United States, we're seeing a lot of schools developing data science programs. And the data science programs are starting with intro statistics, maybe a little bit of data scripting, they do more statistics, big data, and eventually they get to the point of wanting to do data management. A computer science major is organized differently. We start with intro programming, we get into data structures and then we go into more advanced classes like databases, data science and machine learning. And what we often see are students who see that little bit of scripting, say in their second or third course in data science and say, oh, I like programming, I wanna be a CS major. Or we see CS students who get up to the upper level data science class and say, I'm really not into developing software but I really like this focus on data. The student gets to data management on the data science side and now they say, well, I need databases because I think I wanna do more data engineering. And what we have effectively done here is set up a system where it's extremely hard for students to switch. There's so little content alignment from a conventional CS approach and a conventional data science approach that students effectively have to start over to switch. And that's unfortunate, especially when we see that there is a way to start these things commonly with tables and higher order functions as our early programming examples and then let students do a second course in whichever of data science, computer science, computer engineering most fits what they're trying to do. Novice students don't understand these fields well enough to decide which one they want or what they all mean. We really have to think about pedagogy that's gonna help them figure out those interests as they go. So to put this in perspective, I think we have three different subfields or job titles in some sense that we see competing for mind share in early computing education. Data engineering is overlooked frequently even though it's a really important component of being actually able to write scripts that are maintainable in a data science context. Data science is exciting because students across campus are interested in it. Data engineering needs non-trivial computer science. So it needs a foundation to build on that it won't get if we do a statistics based data science alone. Every computer science student these days needs some grounding in data and statistics given the way data is part of everything we do. And we've got these increasing calls for social responsibility at least being raised to students. The sweet spot is the intersection of these three pieces start in the sweet spot and then branch out. And this is what we have been working on. If you're interested in seeing this laid out more as an argument, we have a paper that we wrote in communications of the ACM on this model of data-centric computing. It's really the article version of this talk. We have a textbook that we have been writing called the Data-Centric Conduction to Computing. It's available online. The first version is out. The second version will come out sometime over the winter break when I make revisions after this semester. I've also put the URL there for the course that I'm teaching out of all of this. So thank you very much. I'm happy to take questions on the curricular design or things we know about learning programming through these different venues. But thank you very much.