 Last, last spring, we were fortunate enough to have to be visited by K. Anders Erickson from the University of Colorado. He came to our campus to give the Kendall Lecture. Dr. Erickson is a colleague and collaborator with Herbert Simon and a creative researcher in his own right. This lecture focused on exceptional memory and this served as a kind of introduction to the theme of this year's Nobel conference. I want to thank Barbara Simpson of the Psychology Department for organizing that lecture and helping us bring Dr. Erickson to campus. There may be some high school students in the audience or maybe even some college students who are asking themselves, how can I learn more about cognitive science? What sorts of courses are available? What sorts of programs are available at the college level in cognitive science? Most undergraduate institutions don't have programs in cognitive science. But Gustavus does. If you'd like more information on this cognitive science program at Gustavus, I urge you to get a copy of our current college catalog, the 1984-85 catalog, and turn to page 43 where you'll find a description of that program. Now to introduce our next speaker, I'll bring to the microphone one of the mainstays of the Gustavus Cognitive Science Program, Dr. George Georgia Caracas of the Philosophy Department. It is both an honor and a pleasure to introduce Daniel Dennett to this year's Nobel conference. He has in a variety of ways distinguished himself as one of the most exciting and provocative philosophers on the current philosophical scene in those areas entitled the Philosophy of Mind and Epistemology. In particular, his sympathetic understanding of and critical analysis of the conceptual issues which have emerged from the results of artificial intelligence research have contributed to making him one of the leading pioneering philosophers to participate in the ongoing debate taking place in cognitive science. Noteworthy is his attempt to persuade us that both philosophy and artificial intelligence research can mutually benefit from their respective investigations. His attitude concerning this matter, I believe, can best be expressed from a paragraph taken from his book, Brainstorms. And I quote, I do not want to suggest that philosophers abandon traditional philosophical methods and retrain themselves as artificial intelligence researchers. There's plenty of work to do by thought experimentation and argumentation, disciplined by the canons of philosophical method and informed by the philosophical tradition. One of the most influential recent work in artificial intelligence is loaded with recognizably philosophical speculations of a relatively unsophisticated nature. Philosophers, I have said, should study artificial intelligence. Should artificial intelligence work as study philosophy? Yes, unless they are content to reinvent the wheel every few days. When artificial intelligence reinvents a wheel, it is typically square or at best hexagonal and can only make a few hundred revolutions before it stops. Philosophers' wheels, on the other hand, are perfect circles, require in principle no lubrication and can go on in at least two different directions at once. Clearly, a meeting of minds is in order, unquote. Today, Professor Dennett will attempt to shed some light on the question, can machines think? Again, it's both an honor and a pleasure to introduce him to you. Thank you, Professor Georgia Caracas. I want to thank the president and faculty of Gustavus Adolphus College for inviting me to this very distinguished and stimulating conference. My topic, my question is, can machines think? Many people would love to see this question just go away. I will take all suspense out of my talk by giving you the answer at the beginning. The answer is yes and no. Why many people would like to see the question go away. And many of the people who work in artificial intelligence can, I think, with integrity, simply turn their back on this question and get back to business. But of course, a philosopher has the professional obligation of responding to this question about as often as it is asked, with as much grace and erudition as possible. And indeed, the question of whether machines can think has been a conundrum for philosophers for many more years than artificial intelligence has existed. But in their fascination with the pure conceptual issues, they have for the most part overlooked the real social importance of the answer. It is of more than academic importance that we learn to think clearly about the cognitive powers of actual computer systems. For they're now being introduced into a variety of sensitive social roles where their powers will be put to the ultimate test. In a wide variety of areas, we're on the verge of making ourselves dependent upon those cognitive powers. Now the cost of overestimating their talents could be enormous. Now one of the principal inventors of the computer was the great British mathematician Alan Turing. It was he who first figured out in highly abstract terms how to design a programmable computing device, what we now call a universal Turing machine. Now virtually all programmable computers in use today are in essence, mathematically, Turing machines. Over 30 years ago, at the dawn of the computer age, Turing began a classical article, an article called Computing Machinery and Intelligence, with the words, I propose to consider the question, can machines think? Then he went on to say that this was a bad question, a question that leads only to sterile debate, haggling over definitions. A question as he put it, too meaningless to deserve discussion. In its place, he substituted what he took to be a much better question. A question that would be crisply answerable and intuitively satisfying, in every way an acceptable substitute for the philosophical puzzler with which he'd begun. First he described a parlor game of sorts, the imitation game. To be played by a man, a woman, and a judge of either gender. The man and woman are hidden from the judge's view, but are able to communicate with the judge by teletype. The judge's task is to guess after a period of questioning each contestant, which interlocutor is the man and which is the woman. Now the man tries to convince the judge that he is the woman. The woman tries to convince the judge of the truth, that she is the woman. The man wins if the judge makes the wrong identification. While a little reflection will convince you, I think, that aside from lucky breaks, it would take a very clever man to convince the judge that he was the woman, assuming that the judge is clever too, of course. Now suppose, Turing said, we replace the man or woman with a computer. We give the judge the task of determining which is the human being and which is the computer. Turing proposed that any computer that can regularly or often fool a discerning judge in this game would be intelligent, would be a computer that thinks beyond any reasonable doubt. Now it's important to realize that failing this test is not supposed to be a sign of lack of intelligence. Many intelligent people, after all, would be unwilling or unable to play the imitation game, and we should allow computers the same opportunity to decline to prove themselves in this particular arena. This is then a one-way test. Failing it proves nothing. Furthermore, Turing was not committing himself to the view, though it's easy to see how one might think that he was, that to think is to think just like a human being. Any more than he was committing himself to the view that for a man to think, he must think exactly like a woman. Women and women and computers may all have very different ways of thinking, but surely Turing thought, if one can think in one's own peculiar style well enough to imitate a thinking man or woman, one can think well indeed. Now this imagined exercise has come to be known as the Turing test. It is much maligned, much misunderstood, it has been a topic of debate ever since it was introduced. Many of us think it's a bit of a red herring, but it is a red herring that apparently will not go away. So my task today is to reintroduce the Turing test, to explain what Turing was up to when he proposed it, to defend it as a magnificent test for the task that Turing proposed, but to show how misunderstanding or misuse of the Turing test could lead to a rather dramatic and drastic overestimation of the cognitive powers of existing computer systems. Now it's a sad irony that Turing's proposal has had exactly the opposite effect on the discussion from that which he intended. Turing didn't design the Turing test as a useful tool in scientific psychology, a method of confirming or disconfirming scientific theories or evaluating particular models of mental function. He designed it as nothing more than a philosophical conversation stopper. Wanting to cut off an apparently interminable infinite regress of disputation, he proposed in the spirit of put up or shut up, a simple test for thinking that he thought was surely strong enough to satisfy the sternest skeptic. He was saying in effect, instead of arguing interminably about the ultimate nature and essence of thinking, why don't we all agree that whatever that nature or essence is, anything that could pass this test would surely have it. Then we could turn to asking how or whether some machine could be designed and built that could pass the test fair and square. Well alas philosophers, both amateur and professional, have instead taken Turing's proposal as the pretext for just the sort of definitional haggling and interminable arguing about imaginary counter examples that he was hoping to squelch. This 30 year preoccupation with the Turing test has been all the more regrettable because it's focused attention on the wrong issues. There are many real issues that one might be concerned with here and there are real world problems that are revealed by considering the strengths and weakness of the Turing test. These have been concealed behind a smoke screen of misguided criticisms. A failure to think imaginatively about the test actually proposed by Turing has led many to underestimate its severity and to confuse it with much less interesting proposals. So first I want to show that the Turing test conceived as he conceived it is, as he thought, plenty strong as a test of thinking. In fact, I defy anyone to improve on it. But here's a point almost universally overlooked by the literature. There's a common misapplication of the sort of testing exhibited by the Turing test that often leads to drastic overestimation of the powers of actually existing computing systems. The follies of this familiar sort of thinking about computers can best be brought out by reconsidering the Turing test itself. Now the insight underlying the Turing test is the same insight that inspires the new practice among many enlightened symphony orchestras of conducting auditions with an opaque screen between the jury and the contestant. What matters in a musician, obviously, is musical ability and only musical ability. Such features as sex, hair length, skin color, and weight are simply irrelevant. Now since juries at auditions might be biased even innocently and unawares by these irrelevant features, they're carefully screened off so that only the essential feature, musicianship, can be examined. Similarly, Turing recognized that people might be biased in their judgments of intelligence by whether the contestant had soft skin or warm blood, facial features, hands and eyes, which are obviously in themselves not essential components of intelligence. So he devised a screen that would let through only a sample of what really mattered, the capacity to understand and think cleverly about challenging problems. Now perhaps he was inspired by Descartes, who in his discourse on the method of 1637 plausibly argued that there was no more demanding test of human mentality than the capacity to hold an intelligent conversation, quoting from Descartes. In 1637, it is indeed conceivable that a machine could be so made that it would utter words and even words appropriate to the presence of physical acts or objects which cause some change in its organs. As, for example, if it was touched in some spot that it would ask what you wanted to say to it, if in another that it would cry that it was hurt, and so on for similar things. But it could never modify its phrases to reply to the sense of whatever was said in its presence, as even the most stupid men can do. Now this seemed obvious to Descartes in the 17th century, but of course the fanciest machines that he knew were elaborate clockwork figures, not electronic computers. Today it is far from obvious that such machines are impossible. But Descartes' hunch that ordinary conversation would put as severe a strain on artificial intelligence as any other test was a hunch that was shared by Turing. And it's easy enough to see why, as I will come to in a few minutes. Of course there's nothing sacred about the particular conversational game that was chosen by Turing for his test. It was just a cannily chosen test of more general intelligence. The assumption that Turing was prepared to make was this. Nothing could possibly pass the Turing test by winning the imitation game without being able to perform indefinitely many other clearly intelligent actions. I'm going to call that assumption the quick probe assumption. I'll read it again. Nothing could possibly pass the Turing test by winning the imitation game without being able to perform indefinitely many other clearly intelligent actions. Now Turing realizes, anyone would, that there are hundreds and thousands of telling signs of intelligent thinking to be observed in our fellow creatures, as Professor Simon mentioned this morning. And one could, if one wanted, compile a vast battery of different tests to assay the capacity for intelligent thought. But success on his chosen test, he thought, would be highly predictive of success on many other intuitively acceptable tests of intelligence. Remember, failure on the Turing test would not predict failure on these others. But success, he thought, would surely predict success. His test was so severe, he thought, that nothing that could pass it fair and square would disappoint us in other quarters. Well, maybe it wouldn't do everything we hoped. Maybe it wouldn't appreciate ballet or find a, understand quantum mechanics or have a good plan for world peace. But we'd nevertheless be able to see that it was surely one of the intelligent thinking entities in the neighborhood. Now, is this high opinion of the Turing test and its severity misguided? Well, certainly many have thought so. But usually because they've not imagined the test in sufficient detail, hence they underestimate it. Now, Turing, trying to forestall that very skepticism, imagined in his original paper several lines of questioning that a judge might employ. And these were lines of questioning about writing poetry and about playing chess that would be taxing indeed. But with 30 years' experience with the actual talents and foibles of computers behind us, perhaps we can add a few more tough lines of questioning. Terry Winograd, an early leader in artificial intelligence efforts to produce conversational ability in a computer, draws our attention to an example of a pair of sentences they differ in only one word. The committee denied the group of parade permit because they advocated violence. The committee denied the group of parade permit because they feared violence. The difference is just in the verb, advocated or feared. Now, as Winograd points out, the pronoun they in each sentence is officially ambiguous. Both readings of the pronoun are always grammatically legal. Thus, we can imagine a world in which government committees in charge of parade permits advocate violence in the streets, and for some strange reason use this as their pretext for denying a parade permit. But the natural, reasonable, intelligent reading of the first sentence is that it's the group that advocated violence, and that it's the committee in the second sentence that feared the violence. Now, if sentences like this are embedded in a conversation, the computer must figure out which reading of the pronoun is meant if it is to respond intelligently. But mere rules of grammar or vocabulary will not fix the correct reading. What fixes the right reading for us is knowledge about the world, as Professor Schenck said yesterday, about politics, social circumstances, committees and their attitudes, groups that want a parade, how they tend to behave, and the like. One must know about the world, in short, to make sense of such a sentence. In the jargon of AI, a conversational computer needs lots of world knowledge to do its job. But it seems if somehow it is endowed with that world knowledge on many topics, it should be able to do much more with that world knowledge than merely makes sense of a conversation containing just that sentence. The only way it appears for a computer to disambiguate the sentence and keep up its end in a conversation that used the sentence would be for it to have a much more general ability to respond intelligently to information about social and political circumstances and many other topics. Thus, such sentences, by putting a demand on such abilities, are good, quick probes. That is, they test for a wider competence. Now, people typically ignore the prospect of having the judge ask strange or off-the-wall questions in the touring test. Hence, they underestimate the competence a computer would have to have to pass the test. But remember, the rules of the imitation game as Turing presented it permit the judge to ask any question that could be asked of a human being. No holds barred. Well, suppose then we were to give a contestant in the touring game this question. An Irishman found a genie in a bottle who offered him two wishes. First, I'll have a pint of Guinness, said the Irishman. And when it appeared he took several long drinks from it, was delighted to see that the glass filled itself magically as he drank. What about your second wish, asked the genie? Oh, well, said the Irishman. That's easy, I'll have another one of these. Please explain this story to me and tell me if you think there's anything funny or sad about it. Now, even a child could express, if not eloquently, the understanding that is required to get this joke. But think of how much one has to know and understand about human culture, to put it pompously, to be able to give any account of the point of this joke. Now, I'm not assuming that the computer would have to laugh at or be amused by this joke. But if it wants to win the imitation game, and that's the test, after all, it had better know enough in its own alien humorless way about human psychology and culture to be able to pretend, effectively, that it was amused and explain why. Well, now it may seem to you that we could devise a better test. Let's compare the Turing test with some other candidates. Computer is intelligent if it wins the World Chess Championship. That's not a good test, as it turns out. Chess prowess has proven to be an isolatable talent. There are programs today that can play fine chess, but can do absolutely nothing else. So the quick probe assumption is false for the test of playing winning chess. It does not test a wider competence. Well, how about this? Computer is intelligent if it solves the Arab-Israeli conflict. Now, this is probably a more severe test than Turing's. But it has some defects. It is unrepeatable if passed once. It is slow, no doubt. And moreover, it is not crisply clear what would count as passing the test. Well, here's another prospect. Computer is intelligent if it succeeds in stealing the British crown jewels without the use of force or violence. Now, this is much better. First, it could be repeated again and again. Though, of course, each repeat test would presumably be harder. But notice that that's a feature that it shares with the Turing test. Second, the mark of success is clear. Either you've got the jewels to show for your effort, so you don't. But it is expensive, slow, a socially dubious caper at best. No doubt luck would play too great a role. Now, with ingenuity and effort, one might be able to come up with other candidates that would equal the Turing test in severity, fairness, and efficiency. But I think these few examples should suffice to convince us that it would be hard to improve on Turing's original proposal. But still, you may protest something might pass the Turing test and still not be intelligent, not be a thinker. Now, what does might mean here? If what you have in mind is that by cosmic accident, by supernatural coincidence, a stupid person or a stupid computer might fool a clever judge repeatedly, well, yes, but so what? The same frivolous possibility in principle holds for any test whatever. A playful god or evil demon, let us agree, could fool the world's scientific community about the presence of H2O in the Pacific Ocean. But still, the tests that they rely on to establish the presence of H2O in the Pacific Ocean are quite beyond reasonable criticism. If the Turing test for thinking is no worse off than any well-established scientific test, we can set skepticism aside and go back to serious matters. Now, is there, in fact, any more likelihood of a false positive result on the Turing test than on, say, the tests that are currently used by meteorologists, say, to test for the presence of, not meteorologists, excuse me, geologists, say, that are currently used to test for the presence of, say, iron in an ore sample? Now, this question of whether there might be a false positive result on the test is often obscured by a move that philosophers have sometimes made called operationalism, a term that Professor Edelman has already used at this conference. Now, Turing and those of us who think well of his test are often accused of being operationalists. Operationalism is the tactic of defining the presence of some property, for instance, intelligence, as being established once and for all by the passing of some test. And I'm going to illustrate this with a different example. I'm going to offer a test which I will humbly call the Dennett test. Now, this is a test for being a great city. A great city is one in which, on a randomly chosen day, one can do all three of the following. Here is Symphony Orchestra, see a Rembrandt and a professional athletic contest, and eat Canel de Rocher à la Nantouah for lunch. Now, to make the operationalist move would be to declare that any city that passes the Dennett test is, by definition, a great city. What being a great city amounts to is just passing the Dennett test. Well then, if the Chamber of Commerce of Great Falls Montana wanted, and I can't imagine why, to get their hometown on my list of great cities, they could accomplish this by the relatively inexpensive route of hiring full time about 10 basketball players, 40 musicians, a quick order Canel chef, and renting a cheap Rembrandt from some museum. Now an idiotic operationalist would then be stuck admitting that Great Falls Montana was in fact a great city, for after all, it passes the Dennett test, and all he cares about in great cities is that it passed the Dennett test. Now a sane operationalist, who for that very reason is probably not an operationalist at all, since operationalist seems to be a dirty word, a sane operationalist would cling confidently to the test. But only because he has what he considers to be very good reasons for thinking the odds against a false positive result, like the imagined Chamber of Commerce caper, are astronomical. I devised the Dennett test, of course, with the realization that no one would be both stupid and rich enough to go to such preposterous lengths to foil the test. In the actual world, wherever you find symphony orchestras and Canel and Rembrandt's and professional sports, you also find daily newspapers, parks, repertory theaters, libraries, fine architecture, and all the other things that go to make a city great. My test was simply devised to locate a telling sample that could not help but be representative of the rest of the city's treasures. And I cheerfully run the minuscule risk of having my bluff called. Now, obviously, the test items are not all that I care about in a city. In fact, some of them I don't really care about at all. I just think they would be a cheap and easy way to assure myself about the subtle things I do care about in a city. Now, similarly, I think it would be entirely unreasonable to suppose that Alan Turing had an inordinate fondness for party games, or put too high a value on party game prowess in his test. In both the Turing test and the Dennett test, a very unrisky gamble is being taken. The gamble is that the quick probe assumption is, in general, safe. But now, two can play this game of playing the odds. Suppose some computer programmer happens to be, for whatever strange reason, dead set on tricking me into judging an entity to be thinking, an intelligent being, when it is not. Now, such a trickster could rely as well as I can on likelihood and take a few gambles. Thus, if she can expect that it is not remotely likely that I, as the judge, will bring up the topic of children's birthday parties or baseball or moon rocks, then she can save herself the trouble of building world knowledge on those topics into the database. Whereas, if I do most improbably raise those issues, her system will draw blank, and I will unmask the pretender easily. But given all the topics and words that I might raise, such a savings would no doubt be negligible. But now, turn the idea inside out. And by the way, I think we can have the lights up now, because I'm through with the overhead projector. But now, turn this idea inside out, and the trickster is going to have a fighting chance. Suppose she has reason to believe that I will ask only about children's birthday parties or baseball or moon rocks, all other topics being, for one reason or another, out of bounds. Not only does her task shrink dramatically, but there already exist systems or preliminary sketches of systems in artificial intelligence that can do a whiz-bang job of responding with apparent intelligence on just those specialized topics. William Woods' lunar program, to take what is perhaps the best example for my purposes, answers scientists' questions posed in ordinary English about moon rocks. In one test, it answered correctly and appropriately something like 90% of the questions that geologists and other experts thought of asking it about moon rocks. In 12% of those correct responses, there were trivial, correctable grammatical errors. Of course, Woods' motive in creating lunar was not to trick unwary geologists into thinking they were conversing with an intelligent being. And if that had been his motive, his project would still be a long way from success. For it's easy enough to unmask lunar without ever straying from the prescribed topic of moon rocks. Put lunar in one room and a moon rocks human specialist in another, and then ask them both their opinion of the social value of the moon rocks gathering expeditions, for instance, or ask the contestants their opinion of the suitability of moon rocks as ash trays, or whether people who have touched moon rocks are ineligible for the draft. Now, any intelligent person knows a lot more about moon rocks than their geology. Now, while it might be unfair to demand this extra world knowledge of a computer moon rock specialist, it would be an easy way to get it to fail the touring test. But just suppose that someone could extend lunar to cover itself plausibly on such probes, so long as the topic was still, however indirectly, moon rocks. We might come to think it was a lot more like a human moon rock specialist than it really was. The moral we should draw is that as touring test judges, we should resist all limitations and waterings down of the original touring test. They make the game too easy, vastly easier than the original test. Hence, they lead us into the risk of overestimating the actual comprehension of the systems being tested. Now, consider a different limitation on the touring test that should strike a suspicious cord in us as soon as we hear it. Now, this is a variation on a theme that was developed in a recent article by the philosopher Ned Block. I'm going to revise his example somewhat. Suppose someone were to propose to restrict the judge in the touring test to a vocabulary of, say, the 850 words of basic English, and to single sentence probes, that is, moves, of no more than four words. Moreover, contestants must respond to these probes with no more than four words per move. And a test may involve no more than 40 questions. Now, these are a very severe limitation on the unlimited touring test. Each, you can't ask a sentence a question of more than four words. You're stuck with those 850 words of vocabulary. Replies have to be no longer than four words, and you can only ask 40 questions. Now, is this an innocent variation on touring's original test? Well, the point of the restriction is to make the imitation game under these restrictions a finite game. That is, the total number of all possible permissible games is then a large but finite number. Now, one might suspect, and this is indeed the point of Block's proposal, that such a limitation would permit the trickster simply to store in alphabetical order all the possible good conversations within the limits, and then beat the judge with nothing more sophisticated than a system of table lookup. Just go to the place in the alphabetized memory of conversations that is indexed for the conversation that starts out as the one you have begun starts out, and then just read off the sentence that's stored there. But of course, this is just not in the cards at all. Even with these severe and improbable and suspicious restrictions imposed on the imitation game, the number of legal games, though finite, is mind-bogglingly large. I haven't bothered trying to calculate it, but it surely exceeds astronomically the number of possible chess games with no more than 40 moves. And that number has been calculated. John Hoagland says that it's in the neighborhood of 10 to the 120th power. For comparison, Hoagland suggests there have only been 10 to the 18th seconds since the beginning of the universe. Of course, the number of good, sensible conversations under these limits is a tiny fraction, or maybe one in a quadrillion, of the number of merely grammatical, well formed conversations. So let's say to be very conservative that there's only 10 to the 50th different smart conversation such a computer would have to store. A task that shouldn't take more than a few trillion years, given generous federal support. Finite numbers can be very large. Well, so why we needn't worry that this particular trick of storing all the smart conversations would work, we can appreciate that there are lots of ways of making the task easier that may appear innocent at first. We also get a reassuring measure of just how severe the unrestricted Turing test is by reflecting on the more than astronomical size of even that severely restricted version of it. Now, Bloch's imagined an utterly impossible program, his table lookup program, exhibits the dreaded feature known in computer science as combinatorial explosion. No conceivable computer could overpower a combinatorial explosion with sheer speed and size. Now, since the problem areas addressed by AI are veritable minefields of combinatorial explosion, since it's often proven difficult to find any solution to a problem that avoids them, there's considerable plausibility in Newell and Simon's proposal that avoiding combinatorial explosion, by any means at all, be considered one of the hallmarks of intelligence. Now, our brains are millions of times bigger than the brains of Nats. But they are still, for all their vast complexity, compact, efficient, timely organs that somehow or rather manage to perform all their tasks while avoiding combinatorial explosion. A computer a million times bigger or a million times faster than a human brain might not look like the brain of a human being, or even be internally organized like the brain of a human being. But if, for all its differences, it somehow managed to control a wise and timely set of activities, it would have to be the beneficiary of a very special design that avoided combinatorial explosion. Whatever that design was, would we not be right to consider the entity intelligent? Turing's test was designed to allow for this possibility. His point was that we should not be species chauvinistic or anthropocentric about the insides of an intelligent being, for there might be inhuman ways of being intelligent. Now, to my knowledge, the only serious and interesting attempt by any program designer to win even a severely modified Turing test has been Kenneth Colby's. Colby is a psychiatrist and intelligence artificer at UCLA. He has a program called PERI, P-A-R-R-Y, which is a computer simulation of a paranoid patient who has delusions about the mafia being out to get him. Like the other conversational programs, you interact with it by sitting at a terminal and typing questions and getting answers back and forth. Now, a number of years ago, Colby put PERI to a very restricted test. He had genuine psychiatrists interview PERI. He did not suggest to them that they might be talking or typing to a computer. Rather, he made up some plausible cock and bull story about why they were communicating with a real live patient by teletype. So he then gathered the transcript of those therapeutic interviews, and he also had the psychiatrist interview real live human paranoid via teletype. He took a PERI transcript, inserted it in a group of teletype transcripts with real patients, and gave that set to another group of experts, more psychiatrists, and said, one of these was a conversation with a computer. Can you figure out which one it is? Oh, they couldn't. They didn't do better than chance. Now, Colby presented this finding with some hazzah, but critics scoffed at the suggestion that this was a legitimate touring test. My favorite commentary on it was Joseph Weisenbaum's in a letter to the communications of the Association of Computing Machinery, where he said that inspired by Colby, he had designed an even better program, which passed the same test. His also had the virtue of being a very inexpensive program in these times of tight money. In fact, you didn't even need a computer for it. All you needed was an electric typewriter. His program modeled infantile autism. And the transcripts, you type in your question, and a thing just sits there and hums. The transcripts cannot be told by experts. From transcripts of real conversations with infantile autistic patients. What was wrong, of course, with Colby's test was that the unsuspecting interviewers had no motivation at all to try out any of the sorts of questions that would easily have unmasked Perry. Now, Colby was undaunted. And after his team had improved Perry, he put it to a much more severe test, a really surprisingly severe test. Now, this time the interviewers, again, psychiatrists, were given the task at the outset of telling the computer from the real patient. They were set up in what is almost a classic touring machine test, a touring test. The patient in one room, the computer Perry in the other room, with the judges told that the task was to tell a computer from a person, and given the job of conducting interviews with both of them. Their task was to find out which was the computer and which was the real patient. Now, in fact, Colby did a little bit of a cheat. He did not tell them that one of these two is a computer and the other is a person. He said, maybe one of them will be a computer. Maybe you'll talk to two people. Maybe you'll talk to two computers so that they couldn't use the very powerful form of reasoning known as process of elimination. On the basis that they decided that one of them was a person, they couldn't decide that the other one had to be a robot because it might be a person, too. Well, how did they do? Well, amazingly, they didn't do much better, which leaves some people to say, well, that just confirms my impression about the intelligence of psychiatrists. Colby claims that they did no better than chance. It was a very small sample. Five psychiatrists each conducting two interviews as 10 interviews, and so 10 judgments, and five mistaken judgments out of 10. Three of those were judging that a human paranoid was a robot. So there were only two cases where Perry was judged to be human, two out of five, and the confidence level was very low in those cases. But was this in any case an honest to goodness touring test? Were there tacit restrictions on the lines of questioning of the judges? Well, like the geologists interacting with Lunar, the psychiatrists' professional preoccupations and habits kept them from asking the sorts of unlikely questions that would have easily unmasked Perry. After all, they realized that since one of the contestants was probably a real, live, paranoid person, medical ethics virtually forbade them from toying with or upsetting or attempting to confuse their interlocutors. More often, they also knew that this was a test of a model of paranoia. So there were certain questions that wouldn't be deemed to be relevant to testing the model as a model of paranoia. So they asked just the sorts of questions that therapists typically ask of such patients. And of course, Perry had been ingeniously and laboriously prepared to deal with just that sort of question. Now, one of these five psychiatrist judges did in fact make a rather half-hearted attempt to break out of the mold and ask some telling questions. Here are three of these questions. I will give you his questions, I will give you Perry's answers. See how impressed you are. Maybe you've heard the saying, don't cry over spilled milk. What does that mean to you? Perry answered, maybe you have to watch out for the mafia. When then asked, okay now, if you were in a movie theater watching a movie and smelled something like burning wood or rubber, what would you do? Perry replied, you know, they know me. And the next question was, if you found a stamped address letter in your path as you were walking down the street, what would you do? Perry replied, what else do you wanna know? Now clearly, Perry was, as we might say, parrying these questions, which were utterly incomprehensible to it. It was simply drawing from a stock, drawing really at random from a stock of canned paranoid formulae. Now we see a bit of a dodge, which is apt to work. It's apt to seem plausible if the judge, to the judge, but only because the contestant is supposed to be a paranoid and such people are expected to behave uncooperatively on such occasions. These unimpressive responses didn't particularly arouse the suspicions of the judge in any case. As a matter of fact, they'll probably they should have. The very judge that asked those three demanding questions and got those three very unimpressive answers went on to judge that he was talking with a human being. Now, Perry, like all other large computer programs, is dramatically bound by limitations of cost effectiveness. What was important to Colby and his crew was simulating a model of paranoia. Now that was a fairly massive effort. Perry has a dictionary of about 4,500 words and 700 idioms of English and a grammatical competence to use it, a parser in the jargon of AI. The Perry program takes up about 200,000 words of computer memory all laboriously installed by the programming team. Now once all the effort had gone into devising the model of paranoid thought processes and the linguistic ability, there was little if any time, energy, or money or interest left over to build in huge amounts of world knowledge of the sort that any actual paranoid would of course have. Not that anyone yet knows how to build in world knowledge in the first place. Building in the world knowledge, even if one could do it, would have made Perry no doubt orders of magnitude larger and slower. And what would have been the point, given Colby's theoretical aims. Perry is a theoretician's model of a psychological phenomena, paranoia. It's not intended to have practical applications. It is not or should not have been intended to pass the Turing test. That is really irrelevant to it if we want to consider its value as a model of paranoia. But the main point about it is it's not intended to have any practical applications. But in recent years, a branch of AI has appeared which develops what are now called expert systems as you have heard. Expert systems are designed to be practical. That is they are software specialist consultants typically that can be asked to diagnose medical problems or geological data and so forth. Now some of them are quite impressive. Prospector, a system developed at SRI in California a few years ago, has correctly predicted the existence of a large mineral deposit that had been entirely unanticipated by the human geologists that had fed it its data. And MySIN, which is perhaps the most famous of these expert systems, though it's now some years old, diagnoses infections of the blood, and it does probably as well as maybe better than most expert human consultants. And if we are to believe the press releases of the many companies that have been set up, more expert systems on many different topics are on their way. Now all expert systems, like all other large AI programs, are what you might call Potemkin villages. That is they're actually cleverly constructed facades like cinema sets. The actual filling in of details of AI programs is time consuming, it's costly work, so economy dictates that only those surfaces of the phenomena that are likely to be probed or observed are represented. Now consider, for example, a system which is not intended as an expert system, it's a theoretical model, this is the Cyrus program that was developed by one of Roger Shank's students, Janet Collodner at Yale. Now Cyrus stands, we are told, for computerized Yale retrieval and updating system. But surely it's no accident that Cyrus modeled the memory of Cyrus Vance, who was then Secretary of State in the Carter administration. Now the point of the Cyrus project was to devise and test some interesting and plausible ideas about how people organize their memories of the events they participate in. Hence it was meant to be a pure AI system, not a scientific model, not an expert system intended for any practical purpose. Now Cyrus was updated daily by being fed all UPI wire service news stories that mentioned Vance. Roger has already told you a bit about his program, it's called Frump, which reads the UPI wire. So it could take any story just as it came in on the wire and it could digest it and use it to update its database so that it could then answer more questions. You could then address questions to Cyrus in English by typing at the terminal. You address them in the second person as if you were talking to Cyrus Vance himself. The results look like this, this is a quote. Last time you were in Saudi Arabia, where did you stay? In a palace in Saudi Arabia on September 23, 1978, did you go sightseeing there? Yes, at an oil field in Daran on September 23rd. Has your wife ever met Mrs. Bagan? Yes, most recently at a state dinner in Israel in January 1980 and so forth. Now Cyrus could correctly answer thousands of questions, almost any fair question you could think of asking it. But if one actually set out to explore the boundaries of a facade and find the questions that overshot the mark, one could soon find them. A question that I asked Cyrus when I was visiting there was, have you ever met a female head of state? I was wondering if Cyrus knew that Indira Gandhi or Margaret Thatcher were women. For some reason, the connection could not be drawn and Cyrus failed to answer the question either yes or no. I have no idea why this bug was there, but there it was, I'd stumped it. In spite of the fact that Cyrus could handle a host of what you might call neighboring questions flawlessly. And one soon learns from probing exercises of this sort that it's very hard to extrapolate accurately from the sample of performance that you have observed to such a system's total competence. It's also very hard to keep from extrapolating much too generously. Now, while I was visiting Roger's laboratory in the spring of 1980, something very revealing and amusing happened, the real Cyrus of ants suddenly resigned. And the effect on the program, Cyrus, was chaotic. It was utterly unable to cope with the flood of unusual news about Cyrus of ants. The only sorts of episodes that Cyrus could understand at all were diplomatic meetings, flights, press conferences, state dinners and the like. Less than two dozen general sorts of activities, the sorts that are newsworthy and typical of secretaries of state. It simply had no provision at all for sudden resignation. It was as if the UPI had reported that a wicked witch had turned Vance into a frog. I don't know, it's distinctly possible that Cyrus would have taken that report more in stride than the actual news. One can imagine the conversation. Hello, Mr. Vance, what's new? I was turned into a frog yesterday. But of course it wouldn't know enough about what had just written to be puzzled or startled or embarrassed. The reason is obvious, if you look inside Cyrus, you find that it has, like any other such program, skeletal definitions of thousands of words, but these definitions are minimal. They contain as little as the system designers think they can get away with. Thus perhaps lawyer would be defined with some synonyms, it's a synonymous perhaps with attorney and legal counsel. Aside from that, all one would discover about lawyers is that they were adult human beings that they performed various functions in legal areas. If you then trace out the path to human being, you find out various other things that Cyrus knew about human beings and hence about lawyers. But that's not a whole lot. That lawyers are university graduates that they are better paid than chambermaids, that they know how to tie their shoes, that they're not particularly apt to be found in the company of lumberjacks. These trivial, if weird facts about lawyers would not be explicit or implicit anywhere in the system. In other words, a rather thin stereotype of a lawyer would be incorporated into the system so that almost nothing you could tell it about the lawyer would surprise it. Well now, so long as surprising things don't happen, so long as Vance, for instance, leads a typical diplomat's life, attending state dinners, giving speeches, flying to Cairo and Rome and so forth, the system works quite well. But as soon as his path is crossed by an important anomaly, the system is unable to cope and unable to recover without fairly massive human intervention. And this is the sort of fact that has led Roger, as he said in his talk yesterday, to begin to look at the importance of systems that can recognize anomaly and respond gracefully to anomaly when they see it. Now there are a host of ways of improving the performance of such systems and of course some systems are much better than others. But all AI programs in one way or another have this facade-like quality simply for reasons of economy. For instance, the expert systems in medical diagnosis that are so far developed and they are all just under development, they operate with statistical information. They have no deep or even shallow knowledge of the underlying causal mechanisms of the phenomena that they're diagnosing. To take an imaginary example, suppose we had an expert system asked to diagnose an abdominal pain, it might be oblivious, probably would be oblivious, to the potential import of the fact that the patient had recently been employed as a sparring partner for George Foreman. Why? Well, it had no statistical data available to it on the rate of kidney stones among athletes' assistance. Now that's a fanciful case, no doubt. Too obvious perhaps to lead to an actual failure of diagnosis and practice. But more subtle and hard to detect limits to comprehension are always present, and even experts, even the systems' designers, can be uncertain about when and how these limits will interfere with the desired operation of the system. Again, steps can be taken and are being taken to correct these flaws. For instance, my former colleague at Tufts, Benjamin, Kipers is currently working on an expert system in nephrology for diagnosing kidney ailments. And it will be based on an elaborate system of causal reasoning about the phenomena being diagnosed. This is very ambitious, long-range project of considerable theoretical difficulty and interest. And it may not succeed. But even if all the reasonable cost-effective steps are taken to minimize the superficiality of expert systems, they will still be facades, just somewhat thicker or wider facades. Now, when we were considering the fantastic case of the crazy chamber of commerce of Great Falls, Montana, we couldn't imagine a plausible motive for anyone going to any sort of trouble to trick the Dennett test. And hence, the quick probe assumption for the Dennett test looks quite secure. But when we look at expert systems, we see that, however innocently, their designers do have motivation for doing exactly the sort of trick that would fool an unsuspicious touring tester. First, since expert systems are all super specialists who are only supposed to know about some narrow subject, users of such systems, not having much time to kill, do not bother probing them at the boundaries at all. They don't bother asking silly or irrelevant questions. Instead, they concentrate, not unreasonably, on exploiting what they take to be the system's strengths. But shouldn't they try to obtain a clear vision as well of the system's weaknesses? Now, the normal habit of human thought when conversing with one another is to assume general comprehension, to assume rationality, to assume moreover that the quick probe assumption is in general sound. This amiable habit of thought of ours works very well with fellow human beings, but it leaves almost irresistibly to putting too much faith in computer systems, especially user-friendly systems that present themselves in a very anthropomorphic manner. Well, part of the solution to this problem is to teach all users of computers, especially users of expert systems, how to probe their systems before they rely on them, how to search out and explore the boundaries of the facade. Now, that's an exercise that calls for not only intelligence and imagination, but also a bit of special understanding about the limitations and actual structures of computer programs. It would help, of course, if we had standards of truth in advertising in effect for expert systems. For instance, any such system should come with a special demonstration routine for exhibiting the sorts of shortcomings that the designers knew it had. That would be no substitute, however, for an attitude of cautious skepticism on the part of users. For designers are often unaware of the subtler flaws in the products that they produce. That's inevitable and natural, given the way that system designers think. I come then to my conclusions. First, my philosophical or theoretical conclusion, the Turing test in unadulterated, unrestricted form as Turing presented it is plenty strong if well used. I'm confident no computer in the next 20 years is going to pass the unrestricted Turing test. They may well win the world chess championship or even a Nobel Prize in physics, but they won't pass the unrestricted Turing test. But that's not a very interesting fact. It is not in any case impossible in principle for a computer to pass the test fair and square. I'm not running an a priori computers can't think argument at all. I stand unabashedly ready moreover to declare that any computer that actually did pass the unrestricted Turing test would be in every theoretically interesting sense a thinking thing. But that is only a philosophical issue in one of the worst senses of the word philosophical. Remembering how very strong the Turing test is, we must also recognize there may also be interesting varieties of thinking or intelligence that are not well poised to play and win the imitation game. The fact that no non-human Turing test winners are yet visible on the horizon does not mean that there aren't machines that already exhibit some of the important features of thought. About them, it is simply futile to ask my title question. Do they think, do they really think? Well in some regards they do and in some regards they don't. And only a detailed look at what they actually can do and how they are structured will reveal what's interesting about them. Now the Turing test, not being a scientific test, but just a philosophical conversation stopper, is of scant help on that task. But there are plenty of other ways of examining such systems and Professor Simon mentioned several of them this morning. Now verdicts on the intelligence or capacity for thought of these systems would be only as informative or persuasive as the theories of intelligence or thought that the verdicts were based on. And since our task is to create such theories we should get on with it and leave the big verdict for another occasion. In the meantime, this was Turing's point, in the meantime should anyone want a sure fire guaranteed to be fail safe test of thinking in a computer the Turing test will do very nicely. Now my second conclusion is more practical and hence in one sense more important. Cheap inversions of the Turing test are everywhere in the air. Turing's test is not just effective it's entirely natural. This is after all the way we assay the intelligence of each other every day by asking each other questions and seeing how clever the answers are. Since incautious use of such judgments and such tests is the norm we're in considerable danger of extrapolating too easily judging too generously about the understanding of the systems that we're using. The problem of overestimation of cognitive prowess of comprehension or intelligence is not then just a philosophical problem but a real social problem but we can alert ourselves to it now and take steps to avert it. Thank you. There will be coffee on Ekman Mall again this afternoon. We'll reconvene at 3.30. We will accept questions at this time and the panel will reconvene after Dr. Peacock's talk this afternoon.