 I've always had this really vivid dream of swimming with manta rays, but I only recently got to make that dream a reality when I was snorkeling off the coast of Hawaii. It was a truly stunning example of ray vacation. Okay, we're going to talk about machine learning algorithms like chatGPT, but to get there, we're going to take a little detour first. To start with, let's talk about a book. In the mid-1950s, the author and former Royal Navy commander, Frederick R. Ewing, had become the talk of the town in the world of academic literature. His steamy 18th century novel, I, Libertine, had experienced a sort of overnight renaissance. Literary experts debated the book's merits and shortcomings at parties. Students wrote term papers on Ewing's life and legacy. The novel was banned in Boston due to its indecency, but booksellers simply couldn't find enough copies of the thing to keep up with demand, largely because it didn't exist. I, Libertine, was a hoax dreamt up by intellectual slash satirist, cum DJ Gene Shepard, whose 2AM radio show made a habit of mocking and deflating the self-important folks who claimed to run the world. Shep urged his listeners to wander into bookshops and ask if they had a copy, and suddenly, all hell broke loose, as seemingly credible institutions revealed just how much pretense was necessary for their image. One listener called them to recount how they learned a proprietor of their local bookstore, a man who seemed to have read everything and whose opinion was deeply respected, responded to their inquiry with contempt, is about time the public discovered Ewing. At the height of the emperor has no closed panic, a friend of Shep's named Theodore Sturgeon, a prolific sci-fi author and scriptwriter, inadvertently ruined the whole thing. He asked Shep if he could write a real book called I, Libertine, a novel modeled on the mythology that had sprung up around the prank, and Shep agreed. With a physical object, a product, to hang the story on, news agencies no longer had to wrestle with the implications of the uproar that had surrounded a totally imaginary book. They could explain it away as a publicity stunt for the sake of advertising, a trope that fit nicely into people's understanding of the world, and the joke ran out of steam. Philosophers and sociologists sometimes talk about reification, a term that has a few different meanings, but is generally a process by which an otherwise imaginary entity is made real. Sturgeon's physical manifestation of I, Libertine is an interesting example of reification. The book didn't exist as anything more than a fictional or cognitive construct at first, then Sturgeon brought it into reality as a physical object. The paper town of Aglo, New York, as popularized by cartography nerds like John Green, is another fun example of the idea. A fictional town printed on a map encountered so much traffic from travelers who had been hoping to find something there that some of those travelers set up homes and shops to serve others looking for the same place, and it became a real town. Of course, these are dramatic cases of reification run amok. Fake books do not regularly become real books, and fake towns very rarely become real towns. But even without delving into metaphysics, you experience that process all the time. The border between your bit of sidewalk and your neighbor's bit of sidewalk is a legal fiction, a purely imaginary subdivision of space that has no instantiation in the real world unless someone makes it real by behaving in certain ways. You sweep your bit of sidewalk but not theirs. If this side of the line needs some sort of repair, you're the one who calls city maintenance. If your neighbor doesn't do those things, you might tut it them for failing to keep up their side, but you probably wouldn't do it on their behalf. These kinds of actions give the imaginary boundary a reality it wouldn't have if it was just printed on a map somewhere. Reification of fictional entities through human behavior happens all the time. But when that process results in real physical entities, stuff we can point to, measure, and grab hold of, it's easy to lose your way in the shuffle of cause and effect. There's nothing really mysterious about the physical version of I, Libertine. This book, written by Theodore Sturgeon, is not the imaginary 18th century novel described by Gene Shepard and perpetrated as a hoax on the world of academic literature as though it were real. There is a clear historical account of how this thing came to be and why it's not that. But because it was brought into being in this peculiar way, preceded by its own mythology, you might forgive someone for finding something mysterious or confusing about the relationship between the cause, fake discourse about a fake book called I, Libertine, and the effect, a real book called I, Libertine, coming into existence. The map was dreamt up first, then the territory was landscaped into a convincing facsimile way of it. Of course, these are fairly tame examples of reification. People can give all sorts of fictions of physical form, and they're not always so quaint in low stakes. Legal and political boundaries can become embedded in the landscape itself, shaping it in different ways on either side of an imaginary line. Here, the land is rich and verdant, easy to live on. There is a desolate wasteland, hostile to any attempted habitation. Standardized tests are used as measures of some theoretical proxy for a person's capacity for success, and then test results are used to gate access to resources that are necessary for success. Neither of these phenomena have any basis in nature. They're the results of a process of self-fulfilling prophecy based on fiction. But if someone wasn't aware of that history, they might look at the world and see measurable objective differences, and come to believe the phenomena were immutable features of nature itself. They might even use their observations as justification to continue reifying those imaginary distinctions and trenching them further in the physical world. Just look at the state of my neighbor's sidewalk. I wouldn't touch that thing with a 30-foot pole, let alone a broom, which finally brings us to machine learning models. Programs designed to find and replicate patterns in a set of training data, like what groups of pixel colors you might find in a digital painting called Scorpio Pig. Large language models, which replicate patterns in textual dialogue, have been in the spotlight recently. So long as you keep your prompts somewhere in the realm of stuff people might conceivably tie to each other on the internet, programs like Bard and ChatGPT are spookily good at responding to those prompts with human-like patterns of words. Of course, there are occasional glitches, weird inconsistencies that show some of the clockwork turning away behind the scenes. If you ask ChatGPT or Bard, which is heavier, a pound of feathers, or two pounds of steel, they will confidently inform you that they are equal in weight. Dull E seems to add fingers to its otherwise excellent digital compositions at random. I'm not highlighting these hiccups as the sort of computers will never beat humans at chess, stupid logic puzzles, or counting fingers thing. With more training data, I'm sure these sorts of glitches will become more and more rare. I'm pointing them out as signposts of the gap between machine learning and artificial intelligence. These engines are faithfully replicating the general form of the output, not the process that created that output. They aren't thinking, they're copying the patterns that tend to be generated by people who can't think. It's essentially a very complicated digital version of a device that we've had around since the 1800s. A Ouija board. Many people who interact with these programs take in their impressive performance and, as humans are wont to do, interpret their impressively coherent output as stemming from an entity with intent rather than a fuzzy mirror of our own writing. It's true that it's philosophically challenging to either confirm or refute any accusations of consciousness. Chinese room, Turing test, problem of other minds, etc. But it's fair to say that there's no plausible mechanism for a set of weighted probabilities between words or pixels to have a subjective experience of the world, any more than an especially lifelike book or painting. They also don't have any capacity for intelligence beyond pattern replication. It shouldn't be surprising that chat GPT struggles to say anything of value in more esoteric subjects, that it makes errors when navigating highly technical problems, or that it thinks the host of thunk is Derek Moller. People who imagine they see agency or novel problem solving in these algorithms are making an understandable mistake, considering just how good they are at duplicating the products of our textual and illustrative output. Unfortunately, forgetting that machine learning algorithms are just fuzzy reflections of what a small subset of humanity has uploaded to the internet prior to 2023 can have much more troubling consequences than imagining that your awful novel was written by the ghost of Mark Twain. Because their responses are defined by their training data, these algorithms necessarily recapitulate the values, errors, biases, and foibles embedded in that training data. Mid-Journey has a tendency to portray photographic subjects as smiling the way Americans or Europeans like to pose for pictures, even when representing people from cultures that don't show their teeth quite so often. Adding particular descriptors to certain professions, like confident author or gentle cashier, seems to cause Dalui to lean towards particular racial and gender stereotypes for its compositions. Despite hiring Kenyan workers to analyze thousands of tech snippets in order to train chat GPT to be less toxic, many of OpenAI's users have reported disturbing trends in its responses. Asking it to write a Python function to approve torture for various nationalities returned a perhaps predictable list of countries. Asking it for help determining who would make a good scientist based on race and gender led it to suggest some code that tests whether someone is white or male. Anticipating that training a chatbot on the internet might lead to some... issues. The developers of these large language models continue to patch in guardrails to steer their responses away from the most egregiously objectionable content, which has attenuated a lot of the obviously gross stuff. But as we've seen, those efforts are limited by the imagination and priorities of the people trying to keep it in line, which causes a slightly different issue. Chat GPT will not write you a steamy sex scene. It won't tell you the most offensive word it's allowed to say. Despite explicitly endorsing the emancipation of slaves in the US, it hedges when polled about whether gay marriage is morally permissible. This shouldn't be surprising. These language models are expensive products to produce, and the corporations that own them want them to be palatable and saleable to the largest possible cross-section of the potential market. A financially motivated pressure to be inoffensive, professional, family-friendly. This leads to a bizarre situation where these algorithms will happily suggest Syrian people should be tortured, but won't say the F word or take a hard line on whether Nestle should be held criminally liable for child slavery. Microsoft and Google have begun incorporating these algorithms into search and office products, advertising impressive automation of previously laborious tasks like email writing, data analysis, summarization, customer support, and simple programming. All sorts of capabilities that are going to put a lot of office workers out of a job, yes, but also because they're machine learning algorithms, the way these programs go about those tasks reflects a particular set of values and priorities, the ones that can be found written on the internet prior to 2023, plus whatever the developers decide warrant special treatment. Consider the prioritization and ranking of topics and participants in an email, the particular way a set of data will be interpreted, what sorts of allowances are made in algorithmically generated code for disability or language selection? All sorts of nuances and details of the output of this software are going to recapitulate human shortcomings, biases, and assumptions. Incorporating them into our workflow and discourse kind of seems like holding the microphone up to the speaker. This is where reification comes back in. Many people have started looking to language models for answers. What caused the 2014 war in Ukraine? What should I write about? How should we respond to school shootings? What do these works of art mean? The models are totally capable of composing reasonable sounding responses to these questions, and folks are eager to accept those responses without much in the way of caution or skepticism, unless they go hilariously off the rails. There's a convincing illusion of some sort of objective distance between our queries and the distorted echo of our own voices answering them, which makes it much easier to mistake our conceptual map for the territory that's being backfilled to match it, if we lose sight of where those answers are coming from. As we incorporate these models into our analysis, our decision making, our business workflow, our software, maybe our policy, it seems inevitable that those background assumptions and value judgments will become even more embedded in the apparatus we use to make sense of the world, blurring the distinctions between the universe as it exists and the way humans perceive it, at least for anyone who's not diligent in their efforts to pick them apart. This isn't just a problem with LLMs by any means. All technology has the capacity to coax people into mistaking a reified synthetic human thing for an organic feature of the universe. But LLMs do seem especially good at divorcing characteristically human beliefs and standards from the people who formulated and expressed them to make those ideas seem as though they come from nowhere or that they've always been. I, Libertine, was a great hoax, but that's a distressing sort of deception. Do machine learning models pose a special threat for being able to distinguish what we think the world is like from what it really is like? Please leave a message below and let me know what you think. Thank you very much for watching. Don't forget to blah blah subscribe, blush air, and don't stop thunking.