 In this video, I aim to set a context for our studies of data science and the ethical challenges it poses. Previously in Geography 581, we studied a spectrum of perspectives on ethics in the geospatial technology field. Dan Sui surveyed the major ethical, legal, and policy issues, which are often intertwined in practice. Nadine Sherman traced the emergence of critical GIS. Jeremy Crampton and Nancy Obermeyer represented two poles of approaches to GIS ethics. One approach prioritizes codifying standards of professional conduct. The other prioritizes questioning rules and norms over defining them. In Lesson 3, we also studied some of the ethics case studies curated by the GIS professional ethics project and used Michael Davis's guide to hone our moral reasoning skills. Here in Lesson 4, we'll read perspectives on the nature of data science, as well as the ethical challenges it presents. We'll analyze and discuss more case studies. As you'll see, the long-form cases presented in this lesson are not as amenable to the rather formulaic approach to ethical decision-making we practiced in Lesson 3. Instead, we'll need to elevate our moral reasoning skills to accommodate cases presented in greater detail and nuance. Back in 2003, when we were planning a new online master's degree in GIS, the U.S. Department of Labor identified geospatial technology as a high-growth industry. We put together an advisory board of GIS leaders in industry, government, and academia to help craft a curriculum that would prepare students for success in the field. By and large, our students were successful. Twenty years later, over 500 students have earned the MGIS degree, and we continue to attract more high-quality applicants than we can serve. Meanwhile, geospatial technologies have matured and evolved, and their use has expanded far beyond the traditional geospatial professions of GIS remote sensing, photogrammetry, and land surveying. I wrote an online textbook years ago that I called The Nature of Geographic Information. I included this quote from a commentator named JD Wilson at the beginning of the first chapter. After more than 30 years, we're still confronted by the same major challenge that GIS professionals have always faced. You must have good data, and good data are expensive and difficult to create. One of the biggest changes since those days is the availability of data, not just geographic data, data of all kinds. A new era of data abundance has given rise to the practice of data wrangling and analytics that's come to be known as data science. You're surely aware of the explosive growth of data-driven decision-making and the career opportunities that come with it. And since a large portion of data is geographically referenced, it makes sense that geospatial technologies and methods are relevant to data science. JD Wilson's observation remains true. Even in an era of abundant big data, good data are still expensive and difficult to create. As you may know, much of what data scientists and GIS people do is transform heterogeneous data of uncertain quality, provenance, and currency into good data, meaning data that's suitable for analysis and decision-making. Let's step back a moment and put these trends into historical perspective. Specifically, let's consider the emergence of data science as an expression of what the World Economic Forum calls the Fourth Industrial Revolution. First came an agricultural revolution about 10,000 years ago. Then, beginning in the 18th century, the invention of the steam engine and the construction of railroads brought the first industrial revolution. A second industrial revolution began in the 19th century with the advent of mass production. Digital computers heralded a third industrial revolution beginning in the 1960s. Early GIS emerged as part of that third revolution. Today, the drivers of the Fourth Industrial Revolution include a ubiquitous and mobile internet, smaller, cheaper, and more powerful sensors, and artificial intelligence and machine learning. As you surely know, thought leaders concerned with the impacts of the Fourth Industrial Revolution worry that many of today's occupations may not be sustainable. In a widely cited research article, economist Carl Benedict Frey and machine learning researcher Michael Osborn estimated that 47% of U.S. workers are at risk of technological unemployment. Of the 702 occupations Frey and Osborn analyzed, one of the most susceptible was surveying and mapping technicians. Frey and Osborn calculated a 96% probability that workers in this occupation will be displaced by automation in the coming decade or two. Although the Bureau of Labor Statistics predicted only an 8% decline, it does attribute the decline to advances in technology. Although Frey's and Osborn's research has its critics, prognosis is generally consistent with a body of research by economists, tech leaders, and forward-looking historians who anticipate fundamental disruption of traditional employment by increasingly capable machines. Innovation expert Alec Ross observes that throughout history our most valued commodities have gone from salt and sugar to chemicals and fuels to data and services. Global finance, social media, and the Internet of Things and other human activities generate an unprecedented and ever-increasing volume, velocity, and variety of data. Human analysts and their employers, Ross and others foresee, will rely increasingly on machine learning and artificial intelligence to cope with the data deluge. Data science is a case in point. Richard and Daniel Suskind, authors of The Future of the Professions, foresee that in the long run, increasingly capable machines will transform the work of professionals, leaving most to be replaced by less expert people and high-performing systems. Their hope is that practical expertise will become more openly available, freeing many users from obstacles currently imposed by gatekeepers like physicians, lawyers, accountants, and, well, surveying and mapping technicians. Predictions like the Suskinds about a coming robopocalypse led to what Wired magazine called the Great Tech Panic. Author James Sirowicki argues that the evidence disagrees that automation will take away our jobs. Neither the increased productivity that should accompany automation nor growing unemployment are evident yet. Sirowicki rightly points out that the outsourcing of work to machines is not new. From the cotton gin to the washing machine to the car, jobs have been destroyed, but others have been created. Over and over, he reminds us, we've been terrible at envisioning the new jobs people end up doing. The Suskinds recognize this and don't predict future occupations that may replace the traditional professions. Instead, they do suggest 12 future roles that education should help people prepare for. One of these roles is data scientist. This lesson on data science ethics begins with several important readings. First is the Handbook Data Science by John Kelleher and Brendan Tierney, authors and practitioners created with the Dublin Institute of Technology. As an example of the kinds of insights data scientists seek, they include patterns that help us identify groups of customers exhibiting similar behavior and tastes. Esri's Tapestry data set is a prominent example of such customer segmentation. Even if you're a practicing data scientist, you should read the first chapter of this work at least. The complete printed text is only $15. I encourage you to add it to your library if you haven't already. Kelleher and Tierney suggest a desitoratum of knowledge and skills that a data scientist should seek to master. One of those in the upper right quadrant of this diagram is domain expertise. What do you consider to be your domain expertise? Another knowledge area they call out is data ethics and regulation. Chapter 6 of their book, Privacy and Ethics, considers the problem of how best to balance the freedoms and privacy of individuals and minorities against the security and interests of society. Personally, I'm partial to this witty concept map called the Data Scientist Venn diagram. To get the most out of Stephen Colossus' diagram, you need to be familiar with R. and with commentator Drew Conway. Where would you position yourself in this map? Where do you want to be? Do you agree with Colossus that the perfect data scientist combines high-level competencies in statistical analysis, programming, business, and communication? And where do ethics come in? A second key reading in Lesson 4 is the 2020 O'Reilly Media Publication Data Science Ethics. Its remarkable trio of authors includes Mike Lucidius, Vice President of Content Strategy for O'Reilly Media, Hilary Mason, General Manager for Machine Learning at Cloudera, and DJ Patil, former Chief Data Scientist of the United States under President Obama. This too is a concise work, and it's available free on GitLab, so I expect you to read it all with full attention. The URL appears in our Data Science Readings page in Canvas. The authors of Data Science Ethics point out that professional and scientific organizations like the Association for Computing Machinery, the IEEE, and the American Statistical Association each have codes of ethics that pertain to data. They are skeptical about the value of a Data Science Code of Ethics, or an oath after the fashion of Physicians Hippocratic Oath. They argue that oaths do very little to connect theories and principles to practice. It is one thing to say, researchers must obtain informed consent. It's an entirely different thing to get informed consent at internet scale. In his massive open online course on Data Science Ethics, which I took in 2020 through the edX platform, Professor H. V. Jagadish of the University of Michigan does propose an extremely concise code for Data Science. It's just six words. Do not surprise. Own the outcomes. In other words, Data Science products shouldn't embarrass people associated with the data, and data scientists should own up to unintended outcomes. A third required reading provides a critical perspective on machine ethics, specifically a computational model for descriptive ethics called Delphi. If you're interested in this topic, you might use the article as a jumping off point for an assessment of the state of the art of machine ethics and its applications in the geospatial realm. Lesson 4 spans two weeks. You should aim to complete the assigned readings during the first week. You should also choose a case study for analysis from the two collections of ethics case studies related to data and AI. In the second week, you'll methodically analyze your chosen case and send it to the class, preferably in the live class meeting or by a recorded presentation you share with the class in time for the web meeting. Okay, let's get started. I can't wait to see what you come up with.