 Karthik is a senior technical member of staff member and manager of a text analytics group at IBM Research India. His primary interests are in statistical modeling, applied to text and speech. He leads the team working in several aspects of text analytics, including information extraction, retrieval and machine translation. Prior to joining IBM Research India in 2008, he spent eight years in IBM's TJ Watson Research Institute and Human Language Technologies group working on improving speech recognition technologies. He obtained his PhD from Princeton University in 1999 and BTEC from Indian Institute of Technology Madras. Karthik, that's yours. Okay, thanks. So what I'll be talking about today is about the building of Watson and an architecture that enabled the building of Watson. So a few years ago, IBM Research undertook this effort to build a machine that plays this game called Jeopardy, better than the best human champions, right? And about a year and a half ago, you know, on television, the machine actually beat a couple of people who, you know, until then, I think one of them was unbeaten and the other had the longest winning streak or something like that. Okay, so Watson is the name of the system, you know, that played Jeopardy. And Deep QA is the architecture that sort of enabled the building of Watson. So, I mean, this system was a large system and obviously in a 40-minute talk, I'm only able to give you a flavor of the challenges and some of the ideas behind, you know, the building of the system and sort of, and the architecture that really enabled, there's basically no single thing that worked really well, right? The architecture enabled you to plug in a lot of things and that's what enabled finally the system to win. And just to mention at the start, the work was done mainly at the Watson Research Center in York, led by a team called, led by Dave Ferrucci, and the work happened over some years, right? Okay, so just to start off, like, I just want to talk a little bit about Jeopardy, just to give you a feel for what the system was trying to do, right? So, Jeopardy is this, essentially, it's a quiz show. There's a, you know, there's a board with, you know, different categories. So, the categories may indicate, you know, the answer type. So, you know, the classics, great outdoors, things like that. But it's kind of a little bit vague, right? Often can't tell what the answer, you know, what is the type of the answer that's being looked for. And there's, you know, different levels of difficulty, okay? There's basically three players, right? And one of the players has the chance to select a clue, so they can pick on this board, you know, which clue they want to select. The question is read out, right? Something like this. And you give the answer in the form of a question, but that's, you know, not really important. So, all players complete to answer. So, one person gets to choose the board, all players complete to answer. Whoever buzzes in first, right, gets to answer. And if you, you know, answer right, you win the dollar amount that was on that thing that was picked. If you answer wrong, you lose that dollar amount, right? And the goal, of course, is to have the largest amount of dollars at the end. And there's some details that we can skip it. Prior to, you know, this challenge of deciding to build a computer that plays the best humans, there was this previous thing that IBM had undertaken. I think in 97 this happened. So, you know, the challenge of playing chess better than the best chess players at that time. And, you know, the reason this new one was undertaken is, you know, just to get into the space, which is kind of very different from chess. In chess, the rules are, you know, well-defined. The search space is finite and, you know, although it's large, it's finite and well-defined. Whereas human language is sort of ambiguous, you know, words don't have meaning by themselves. It's all, there's context and there's, humans tolerate a lot of ambiguity in, you know, while conversing in natural language. And essentially the search space, you know, you can say, yeah, maybe it's larger, right? In this case. But essentially it's quite a different sort of a challenge than playing chess, right? That's the basic point. Now, before getting into, like, what is hard about this, you know, some things that a computer finds really easy, right? Some large calculation. You know, as humans we can't do this, but for a computer it's easy, right? And some other things, right? If you want to answer a question, like, what prize did David Jones pay for a laptop? If you have a database where, you know, you have, you know, people, the serial numbers of the items they bought, serial number linked to the invoice number and the invoice number linked to the payment, you know, this is just, this is simple, right, for computers. Already, like, if, you know, if instead of David Jones you say Dave Jones, then it's a little bit tricky, but still it's, you know, it's just this little bit of a fuzzy match to do. But these are the sort of things that, you know, is easy, basically, for a computer, right? Natural language, like I mentioned before, it's implicit, highly contextual, ambiguous, right? And so, you know, if you're trying to answer a question, like, where was X born? If you had a database which had, you know, people and their place of birth, you know, then it would be easy to just look up the person in this database. But, you know, like, so the thing that Jeopardy, this Watson system is trying to solve is trying to get to that same information from text like this, right? So, you know, from this sentence, you can tell that Einstein was born in Ul, but it's kind of hard to figure it out, right? And certainly it's hard to build a system that can figure these things out. Similarly, you know, you know, if you had a, you know, database of P person who ran organization, it's much easier to get the information than if you had to get it out of, you know, text of this kind. So that's basically the challenge that the system is trying to address. So, just to summarize some of the things, right? Why did IBM pick up this particular challenge? So the thing is it is broad and open domain. So it isn't that you can sort of go and say, okay, I'll make a database of all people and where they were born and, you know, people and the organizations they ran and, you know, just a few databases like that wouldn't cover the range of questions, like that you would get in a quiz show like this. The language is very complex, right? So, I mean, if you look at the questions, some of the questions are shown up there, right? The language is complex. Often, you know, there are meant to be jokes in the questions that are puns and so on, right? So it is, the questions are essentially not easy to understand. They're not of the sort like where is X born, right? Where the question is simple and you can sort of potentially write rules to decide how to understand that sort of text. The other thing is high precision is required, right? You need the answer. If you don't get the right answer, you lose money, right? So you have to be very precise and the next point is accurate confidence, right? You have to know when you know the answer so that you don't buzz in when you aren't sure of the answer, right? So the confidence that the system needs to know what it knows, right? And the other thing is high speed. So the game, it's a quiz show where everybody is buzzing in. If you don't buzz in before the other players, you don't get to answer. So you know, you need to do this. All the calculations that you need to do to come up with the right answer, those need to be done really fast, right? So these were sort of some of the considerations in deciding to, you know, choose this as a challenge to go after. So just sort of just illustrating one of the points, right? Broad domain. So this graph is actually showing on the x-axis is the answer type, right? And on the y-axis is the fraction of questions that had that answer type, right? And, you know, this is just collating statistics over previous Jeopardy games, right? So if you look at, you know, the most frequent answer type is he, right? Which is, you know, so you know it's a male person. And that accounts for 2.5% of the questions. And, you know, you see as you go out there, there's lots of different answer types that, you know, there's very small. So it's basically a long tail distribution that, you know, you're not going to make much progress by saying I'll look at each answer type. For these answer types, I'll build up some table that stores all the information about that answer type and so on. Right? So basically, you know, the domain is broad and you're not going to make much headway looking at, you know, looking at the head of the distribution, okay? So the focus of the system was basically, you know, it's a challenge in building NLP technology, right? Natural Language Processing Technology where you're trying to read natural language text and trying to extract the answers out of this and being able to know, you know, how to answer questions based on all this fairly large amounts of text data, right? And, you know, structured sources are used but they are sort of just used to sort of help interpret the text and a lot of the text, a lot of the structured, a lot of structure is derived from just natural language text and that's sort of the main source of information that is used to answer these questions. Moving on, I mean, so one thing with this problem is, right, like we're all used to Google and you can go and many of these questions, you know, you type in a few keywords and you will get the right answer, you know, in the top few documents. Of course, the challenge is, you know, we want to be, we don't want to return a document, we want to return the answer and it has to be, you know, the top one, otherwise you don't win, right? And just as an example though of, you know, how keyword evidence can lead you astray. So this is an example question, right, on the top left. In May 1898, Portugal celebrated the 400th anniversary of this explorer's arrival in India, right? And I think most of us know the answer but, you know, somewhere out there there's this other sentence, right? In May, Gary arrived in India after he celebrated his anniversary in Portugal. Now, there's a lot of words that sort of match up, right? But obviously, Gary is not the right answer that we're looking for to this question, right? So, you know, this is basically the challenge, right? Of going beyond keywords and sort of getting a deeper understanding of the text to be able to score the likelihood that Gary is the answer to that question, right? And this, the challenge, of course, is that this stronger evidence is much harder to find and there's no single thing that, you know, tells you that's very strong evidence, right? So one of the things behind this architecture that was built is that it allows you to put in, throw in lots of types of information which is then combined in sort of an optimal way, right? I mean, just to look at, I mean, in this particular example, right, you sort of need to do some temporal reasoning to figure out, you know, it's talking about the 400th anniversary in 1898, right? So the event happened in 1498. There's some sort of, you need to know that, you know, landed in is the same as arrived in, right, or arrival in, right? So because, you know, the sentence that's giving you evidence that the right answer is Vasco da Gama is using the word landed in, right, instead of arrived in. And then you also need to know, for example, that sentence talks about Kappad beach and you need to know that that's in India, right? Because the question asks about, you know, the arrival in India, right? So there's, first thing is the, you know, keywords are good, good source of evidence, but it's not nearly enough. And there's all these other kinds of things that you need to handle to play jeopardy well, right? You need to reason about time, some amount of geospatial reasoning and so on. So there's lots of special things that you need to do to play jeopardy well. So, I mean, just some other examples, right, of the complexity of what happens in jeopardy. So, you know, this question, when 60 minutes premiered, this man was a U.S. president. So here, you know, there's two parts to the question and to figure out the answer, you need to sort of separate these two parts. You need to know when 60 minutes this TV show premiered and, you know, then you need to say who was the president in that year, right? The year that 60 minutes premiered. So basically, one of the things is this deep QA architecture that plays jeopardy, you know, it attempts different decompositions of the question. Does, you know, each part separately and then you sort of put it back together. And, you know, in this example, you needed to put it back together in a certain way, right? One of the answers was this year and then you need to say who was the president in this year. So that's the way you're putting it back together. And another example that, you know, of the kind of stuff you need to do. So this is a clue, right? A long tiresome speech delivered by a frothy pie topping. And, again, you need to split it into two parts. But in this case, the answer is, you know, just the combination of the answers to those two parts. So, I mean, you need to split these sentences, questions up sometimes. You need to then figure out how are you going to combine these things back to create the right answer. And, you know, in this case, like the category was edible rhyme time. So it's kind of indicating what sort of thing you want for the answer, but it's pretty, you know, it's pretty tricky to know what it is. Another example of sort of the complexity of this thing is because, so often you don't have text containing all of the pieces of the answer, right? So in this question, I mean, just on that top figure, you know, you may, sort of thing that it tries to do is that shirts have buttons. So do TV remote controls and telephones, right? So in this question, on hearing of the discovery of George Mallory's body, he told reporter he still thinks he was first, right? And you may not find any text which puts, you know, the right answer to this is actually Edmund Hillary. And you may not find any text which sort of has all these pieces together. And so you need to piece together this graph that, you know, links George Mallory to Mount Everest and also Edmund Hillary to Mount Everest, right? So, okay, so I guess I've convinced you that the game is fairly hard to play for a computer, right? And this chart here just shows basically the performance. So, okay, the two axes are, you know, what fraction of questions did a player answer? And that's on the x-axis. On the y-axis is the precision, right? Given that he answered 100 questions, what fraction did he answer correctly? Now, the red dots are sort of the grand champion performance that the system was targeting to beat, right? So those red dots, all those dots on the top, you know, top here are basically human players' performance. The red dots are grand champions. And all of these are actually winning performances, right? So these are winning performances. And, you know, the red dots are sort of the champion players. So they answer a lot of questions, you know, roughly maybe 70% of them. When they do answer, they are 90% correct. And when IBM started... So when IBM started with this, IBM was already participating in these question-answering competitions, right? Which were, you know, which were happening at that time. And in 2007, this was sort of the brown curve there was the performance of the system, right? So it was quite a way below, you know, where we needed to get, right? This was sort of the state of the art. I mean, IBM systems then used to participate and be in the top few systems of all the, you know, competitors in these competitions, right? So there was a long way to go, right? And, you know, these are sort of two initial approaches that were tried out, you know, like trying to build a structured knowledge base or trying to gather answers from a structured knowledge base. That's the red curve. And the other curve, you know, the blue curve is just doing sort of tech search with, you know, essentially keyword matching. So basically, both of those approaches, I mean, this is on a small domain sort of thing where the structured knowledge base had some reasonable coverage. Basically, you see, you know, it's still way below this, you know, where we want to get to. This is just illustrating the fact that neither keyword search nor, you know, trying to build a database of which contains all these answers is sort of a viable approach, right? Okay, so this sort of is the high level, you know, architecture of the system, right? So the question comes in on the left here. You know, there's an analysis of the question which includes things like deciding if you want to split this question up into parts, decides, you know, what is the answer type that's being asked for. So he landed in India, that means answer type is he, right? So that's the sort of thing that this question and topic analysis module deals with. And, you know, once it's done, like I said, you know, the question is perhaps decomposed into many sort of sub-questions and each of these is handled in parallel, right? So all of the parts are sort of handled in parallel and that happens, you know, at all stages, right? So at the question analysis stage, you may come up with several different sub-questions that you need to answer in parallel. While looking at, you know, so, okay, this, so, okay, let's just go through this first. So this candidate answer generation box here, that sort of looks at the keywords or whatever, extract some things from the question that, and then looks up what might be candidate answers, right? So you do a primary search, which is essentially a keyword-based search. You get some top-ranking documents and from those top-ranking documents, you try and pick out what are candidate answers, right? Once you do this, you're then looking at more detail evidence. So I mean, the way this is going is that initially you do some sort of lightweight, more lightweight processing, try to get some candidates and then do more sort of detailed processing. But now you've sort of restricted your candidate set to a smaller set so you can actually do more detailed processing, right? So this first step is just doing sort of a keyword search and then trying to get, bring up some documents which may contain the right answers and trying to pick out where the right answers are in those documents. The next box here, which is about, you know, hypothesis and evidence scoring, that's sort of evaluating different sorts of evidence. So one of the evidence might be, like, what is the keyword match between... So what is done at this stage is you put the answers back, the candidate answer back along with the keywords in the question, do another search and come back with passages that contain all of this information and then sort of do a more detailed evaluation of those passages. So you could look at, for example, the keyword overlap between that passage and, you know, your original question and things like that as one of the sources of evidence. But I'll sort of discuss that, you know, there's many more sources of evidence that need to go into this to make, sort of, to make the system better. The synthesis module is trying to then combine these different pieces, right? So you might have split your question up into these different parts. You're trying to then combine those pieces, you know, in the synthesis module to sort of come up with the one answer, right? That might involve just putting together the answers of these different parts. It might involve other things. And finally, you know, you have all your sort of candidate answers. You've computed these different evidence scores and then you're trying to like, you know, rank all of these merge all... So you may have duplicate answers, right? You may have Albert Einstein in there. You might have A Einstein and you might have some evidence with the name A Einstein and some other evidence with Albert Einstein. You need to figure out these two are the same and sort of merge those two things before comparing it with your other candidate answers, right? So this final step is doing the merging of the candidate answers and doing a final ranking based on all your evidence that you have sort of collected, you know, in the previous steps, right? So this is sort of at a high level, you know, the kind of stuff that's done. I mean, the main thing here is that things are done in parallel. So, you know, once you evaluate... once you sort of split up the questions into these different parts, you do it all separately, once you figure out... once you retrieve some different candidate passages and from each of those, you get some candidate answers. The evidence for each of those can, you know, then again be done in parallel. It's only at the end that you need to combine this in sort of together. And, you know, the number given here... I believe it was like a million lines of code and, you know, it says thousands of natural language processing algorithms, right? So there's basically what was found is that, you know, there's no one bullet and actually the work that was done in building the system, there's a series of papers that came out in, you know, last month in the IBM Journal for Research and Development which sort of describes these various parts of the system. So if you're interested, you know, there's sort of details that have been published just last month. Yeah, and so this is what I was talking about, right? So there's this explosion of things as you, you know, each of these modules, you know, generates possibilities and you sort of keep calculating on all of them, pruning them down and sort of the pruning is, you know, as fast as possible, we try to keep it at the end, right? Okay, and just, you know, to illustrate again, once again, the difficulty of this task, this lists some early answers, right? The things in green are the correct answers and the things in red are the wrong answers that an early version of the system was giving, right? So, you know, there's lots of text out there that, you know, based on which you can come up with wrong answers, right? And it's kind of hard to figure out, you know, to get to that 80%, 90% precision while answering about 70% of the questions was actually fairly hard. Just to illustrate sort of the processing that's going on in that pipeline that I showed, right? So this is a question, right? So the first step is to just analyze the question so you might extract keywords, you might extract the fact that, you know, the answer type in this case is a Comet Discoverer and there's a date mentioned in the question 1698 and so on and there are some relationships, you know, there's a ship that was, you know, more pink was the name of the ship. So there's some relations that are mentioned in the question. You do a primary search which is, you know, essentially a keyword search plus, you know, slightly more structured based on these things you've extracted from the question and you come up with candidate answers, right? So in this case, you know, it was a Comet Discoverer. You may not want to use it at this early stage. So anyway, these were the sort of candidate answers that the system came up with based on all the keywords. You then go and put these answers, candidate answers back in the question, do further searches to, you know, find passages that provide evidence about each of those possible answers, right? So you might plug Isaac Newton back with all those other, you know, words in the question, search for passages and come back with other potential answers. Find a comeback with passages that indicate that this is a potential answer, right? You then, these are sort of the evidence profiles that I mentioned, right? I'll talk a little more about what these are. But one of those, for example, would be keyword evidence, right? And then you have several others. And then you merge and, you know, combine all these back and in the end you have, you know, Edmund Haley ranking at the top and you have an 84% confidence in this answer, right? One of the things, one of the key parts of the system is this part that actually automatically learns from reading documents, right? So its input is large amounts of text and the output, you know, you get some sort of semantic knowledge, you know, for example, that a fluid is a liquid, that inventors, patent inventions, that official submit resignations and things like this, right? So this is basically one of the knowledge bases that's automatically extracted from the text from large amounts of text that is then used for several things in the system, right? So one of the things I'll show in the next slide is it's used to give evidence that, you know, of an answer type. So there's an answer type that you're looking for in the question and you want to know if there's evidence that, so for example, if Albert Einstein is a scientist, right? So the question might mention this scientist and, you know, one of the candidate answers Albert Einstein and you want to know is there evidence somewhere that Einstein was a scientist, right? So that's the sort of stuff that this knowledge base is used to provide information about. So, I mean, the way this is done is, you know, you read the plain text, you sort of use standard, at this point, standard natural language processing technologies like parsers and, you know, entity extractors, so which, you know, for example, know that Einstein is a person and a scientist, 1921 is a year and so on. And then you have a parse of the sentence which sort of gives you, you know, the relations between the different words. And then you aggregate all of this information to come up with facts in your knowledge base of the kind, you know, scientist win prizes. So this sort of information is used, for example, you know, to answer such questions, right? In cell division, you know, there's this question and you're looking for this liquid and you have some candidate pass, you know, one of the candidate answers is cytoplasm and it's mentioned that cytoplasm is a fluid surrounding the nucleus, right? So if you go and look up on WordNet, a fluid is not a liquid, right? So a liquid is a fluid, but a fluid is not a liquid, strictly speaking, but often these two words are used interchangeably, right? So if you go and look at in WordNet, whether, you know, a fluid is a liquid, probably you'll not get back, yes. But, you know, based on just reading all this amount of text, you can figure out that essentially in natural language, people use fluid and liquid interchangeably and, you know, all things that were referred to as fluids were also referred to as liquids, right? So this is the sort of stuff that basically you can extract by just looking at large amounts of text, right? A little bit about this evidence profiles, right? I mean, I've been saying that keywords is one thing that's, you know, part of this evidence profile. There's other things, right? You know, just things like source reliability. If you get a stuff off of some blog page, it's less reliable than if it's on Wikipedia. Things like that. I mean, I won't talk much about this, but I'll just give an example of the sort of thing that the way the system was developed, right? So, you know, you have these different sort of evidence profiles. This is a question, right? You will find Bethel College and a seminary in this holy Minnesota city. And there is actually a Bethel College and a seminary in both cities, right? In St. Paul and South Bend, right? And basically all the evidence that was in the system at the time wasn't able to distinguish between these two answers, right? And so what was actually done in this case was to add a pun relation, you know, score, effectively. So there's basically this information, right? It's saying holy Minnesota city which sort of indicates that the answer is St. Paul because of this pun relation score, right? So basically as the system was built, you know, you look at how the system is doing on these bunch of questions. You say, okay, how can I figure out that this thing is the right answer and not this other answer that I'm currently generating? And you may end up adding, you know, a different evidence sort of mechanism that indicates, you know, that right answer. One thing also which I mentioned before is that categories, you know, are not often that simple, right? So, I mean, just to give an example. So, you know, this category, and so what the system actually does is while playing, it's trying to figure out what this category type means, right? So in this example, like the category was celebrations of the month. And so initially it's not clear what is the answer type. After observing, like, a couple of answers to a couple of questions in the round, right? So these are the first few answers that, you know, so the Watson's answer is shown in red for the first three questions, right? And, you know, the correct answer based on what happened, you know, was these, right? June, November, and May. So after these three first, you know, wrong attempts by Watson's figured out that, you know, the answer type is actually a month. And so there's some amount of online learning that's happening that allows the system to get better, right? So on the next two answers, it was able to, you know, on the next two questions in that category, it was able to get the right answer because it's figured out the right answer type, okay? This is sort of, I mean, I guess I've talked about most of these things. This is just sort of, you know, some of the highlights of the system. I'll just skip over that actually, okay? Okay, this is sort of, you know, a progress chart as a function of time, right? So we started, you know, at the bottom brown curve when, you know, when the project was started. And these other curves are sort of the progress as, you know, basically as a function of time, right? So there's the time labels on each of those. And so by November in 2010, you know, the system was up in that, in the same cloud, sort of at least partially overlapping with where all the red dots are, right? So at least some of the red dots are below the blue line, right? And so what is happening in all of this time was like various different things were being tried out to see if it actually improves the performance of the system, looking at sort of an error analysis of what is happening. What things is the system getting wrong and what sort of somewhat general things can you capture to get sort of a next fraction of answers, right? Just a little bit about the, you know, the... So the system is built on UEMA. So, okay. Okay, so the system is built on UEMA. UEMA is a framework that's now out there in the open source. It allows to build sort of text analytics applications where, you know, you can have several modules that talk to each other and it sort of makes it easy to do. And given the fact that there were so many different modules that are computing so many different things, this was sort of essential. And UEMA AS allows for scale-out. So you can then parallelize a lot of these computations that can actually happen in parallel. And that allows you to get to, you know... In this case, it allowed us to get to answering the question in the time we needed to, right? So basically, you know, you sort of need to buzz in in three seconds after... Watson got the question as it was flashed on the screen and basically you need to buzz in in sort of three seconds pretty much. And so the actual amount of computation that Watson was doing to answer a single question was about two hours, right? And basically, using this parallelism, it basically then ends up answering a question in, you know, two to six seconds. There are some other parts of, you know, of the system that we haven't talked much about, you know. The natural language processing is the main aspect. But there's also basically this game strategy, right? When do you decide... I mean, how much do you decide to bet? Which questions do you decide to choose? How do you decide when to answer and when not to and so on? So I'll skip over those. I have only a little time left, so I'll just talk about a little bit about, you know, the difference between Watson and a search engine, right? Like, and some possible applications that we are working on right now. So, you know, we all use search engines and, you know, as a decision maker, you know, if you need to find out about something, you have some need in mind. You distill that to a few keywords. You go and do a search, and then you look at the documents that come back. You may modify your query, find more documents. You look at all of that stuff, and then you, you know, try and find what you need, basically, right? What Watson is enabling, in principle, is that you just ask a natural language question and the system comes back with a set of potential answers, right? A lot of the stuff that we normally do is being done by the system, right? And one of the things that we looked at is sort of a medical application, right? An application in the healthcare domain. So here what has been found is that a lot of the errors that happen in healthcare are due to diagnostic errors. And one of the reasons for that is that, you know, the doctors make up their mind too soon. Basically, based on the evidence they see, they might come up with a diagnosis that's reasonable, but there may be like one or two other possibilities that they have sort of overlooked. So, you know, you can imagine a system where, you know, you have the symptoms that you have and you have all the information that you have. You pose a question to the system and it comes back with this list of answers. And although the top one is probably what the doctor suspects anyway, he can get to look at these other things and then sort of in his mind either rule them out or maybe order additional tests to sort of, you know, to eliminate those. Okay? You know, sort of another application, possible application as technical support, right? You know, lots of people are just on phones answering questions people have about some product they bought or something like that. You know, can computers just automate this task, right? That's, can Watson automate this task? That's one question. There's similar knowledge management challenges inside enterprises, right? Search works quite well. Google search works really well for information out there. Like typically search in enterprises works far, you know, far more poorly. There's usually much less duplication of information. It's, there's many, and so, you know, for example, if you put in the wrong keyword, you'll find nothing on Google. You'll, you know, find some other page that uses the same keyword that you put in, right? So, just search inside enterprises as a task. That's potentially an application. So, you know, I'll, since I'm out of time, I'll end here. I'll just go to my last chart which has a picture of all the people involved just to show you that it was a large team involved over several years, right? These are some of the people who worked on the system. So, I'll take questions now. Questions? My question is, how important is the curation and selection of the sources that was used to create the database knowledge base or whatever? And one more question is, if speech-to-text were also part of the sort of problem to be solved, how much more difficulty does it add? Yeah, so speech-to-text, we didn't solve in this, and it would make it much more difficult, I believe. I mean, that's the reason we left our IBM has been working on speech-to-text for a long time, but at the state of the art, I mean, I don't know if you have numbers to say, you know, where that curve would go if you had to do the... My suspicion, it'll fall quite a bit having worked on speech myself because, you know, one wrong word will throw you off quite a bit. There was another question, right? Yeah, so the sources are quite important. So, for example, when you want to then go and build it for healthcare, you need to figure out, I mean, the sources that were used for jeopardy are clearly not adequate. You need to go and think about what sources would be more appropriate. So the selection of sources are important, not so much... Then it's pretty much the processing is fairly automated, right? No, so one of the papers that I mentioned is actually looking at identifying knowledge gaps. So if you have a set of questions that sort of represent the sort of things you need to deal with, you can then go and look, you know, are my documents adequate or not and sort of, you know, come up with these ones which you're not able to answer and then think about how you're going to get... So then you can think about doing Google searches with some of those keywords and picking up some common, you know, sites that contain a lot of that information and think about whether... So, yeah, it is a problem deciding what to put in. And that's pretty much done fairly manually, there is a, you know, process that tries to go and find what we are missing and then selecting more stuff based on that. Hi, I have two questions actually. So one is, was there any category or topic in which you found that Watson wasn't doing particularly well? And if so, was there any particular reason for that? Second one is, so you mentioned earlier that the first part of Watson was after the question was passed and the components were figured out, it would basically do an answer matching, answer search with an existing structure, right? So could you talk a bit more about how this structure was stored? It's obviously not how a traditional search engine would probably index this thing because here it's more like a graph where different... You need to build relationships between the different objects, right? Yes. So is there any existing literature that talks about that? If not, could you briefly mention? So for the initial search, it's pretty much the way standard search engines are done because you just need to go over a lot of documents and get the, you know, candidates. Once you have the candidate answers and you have a smaller set of passages and the more detailed evaluation of potential answers, then uses custom data structures to, you know, do sort of the evaluation. So the initial search is still pretty much keyword based because you put it in a usual index, sort of index, two or three different ones were used with two or three different sort of standard scoring functions. I think those are mentioned in the papers, you know, which have come out since then. Whether there was a particular... I don't think there was at the end any one particular category, right? Because that was sort of... The development over several years, that's what was happening is to go and look for these patterns of things that it gets wrong and then sort of, you know, solve that. That sort of category that was still left at the end. I mean, these categories are very sort of, there's a long tail, right? There aren't, first of all, categories that are huge, right? So I wouldn't say that at the end there was any sort of particular things. I mean, they were obviously things that we knew the system should be doing that it wasn't doing. Like for example, it once gave the same wrong answer as one of the... So a person answered first, he gave an answer and then Watson gave the same wrong answer. So this was sort of, for example, a pattern that was known. We wanted to fix it. This requires speech to text, right? You need to know what the person said, but it just wasn't done in time to be a part of the system. And there were other things, right? So, you know, I think one of the... It answered a Canadian city instead of a U.S. city when the category said U.S. city. And this is obviously because it waits, it has all these features and it has sort of a machine learning approach and there is, you know, the category type and there's some wait for that category type. But, you know, it's still doing things that humans wouldn't do, but, you know, I don't think there was sort of patterns that are at least easily solvable at that point. Last question. Hi, up here. A follow-up to the previous question, can you give us a flavor of what kind of documents or corpora you used just as a sample? Yeah, so the main corpus for the Jeopardy game is Wikipedia. That's the single biggest... single most useful corpus. Then there were things done like going and crawling a bunch of books and even like indexing. Yeah, so there was a lot of stuff, but the single, I mean, I don't know the exact numbers, but like you would have gotten quite a way up there as the final system if you had just used Wikipedia. Great. Thanks, Karthik. We'll be back at 11.30.