 In the following, we will discuss a concept that superficially doesn't seem to be problematic at all. But we will see that this concept, which concerns the nature of words, involves a number of problems and is not that easy as we often might want to think. We will base our arguments primarily on present-day English, but I will be supported by two MA students from our MA program, Linguistics and Web Technology, who will use their language, namely Arabic and Chinese, to support my arguments. So the first student is Siam Hamloi from Tunisia. Siam, can you introduce yourself? Hello, my name is Siam, and I speak Arabic. Okay, that must have been Arabic. I don't speak a word of Arabic, but it sounded great. The next one is Mao Mao, also a student in our MA program, she is from China. Hello, my name is Mao, and I speak Chinese. Great, thank you very much. So we will use their mother tongues later on. Okay, we will organize our introduction as follows. First of all, we will talk about the question, how many words do we have in a certain string. Then we will look at words across languages. And finally, we will discuss the segmentation of words. So let's start here with a very interesting question. The question, which is actually quite simple we might think. What is a word? So here we have a string where we might define a word as a unit surrounded by blanks in English. So this would be a word is, and this would be a word. Now we could also say a word is defined by a blank plus a punctuation mark. So here at the end we have a punctuation mark, or at the beginning we have just the beginning of a sentence. So these are criteria of defining words. But is it that simple? Now look at the pronunciation. If we pronounce each word in isolation, then we would have something like, what is a word? But do we speak like that? Do we really have four words in terms of phonology where each word is pronounced in isolation? Well, possibly not. Rather we would say something like this, what is a word? And now we have just two words if we consider phonological aspects. So here in the first part we have four tone units, each word in isolation. In the second we have two tone units, one tone unit with one nucleus and two phonological units. So it's very difficult to say something about how many words we have if we incorporate phonology. Let's look at some examples. Now here we have some English words. We would all say that the word room, well that this is one word really. But what about classroom? Is it two words? One word? One word? Well normally we would say it's one word, some people even spell it with a hyphen. Maybe classroom exhibits two words. Board is certainly one word. And a board that is white? Well that's two words isn't it? But what about the device on which I'm writing? A white board? Is that one word? I would say yes. So here stress influences our counting. A white board is a board that is white and a white board is something that is an electronic device. Interactive is certainly one word. But what about a non-interactive white board? We've said that a white board is one word. Non-interactive is also one word. But couldn't we say that non-interactive is also two words? Well it's very difficult to decide as you can see here. Look at some further considerations here for phonology. Normally we would say that a sentence like I like it consists of four words. I will like it. But if we actually transcribe this string phonetically, then we will see that we possibly have something like I like it. And now if you just listen to this string we would easily say this is one word. Another problem. The next one. Here is our answer. So traditionally you would have all said these are two words. Four words, sorry. Here is our answer. Again if we apply phonology here we would say okay. Here's, you see here is, comes out as here's. Already we have only one word. And then we have a weak form R. Here's our answer. This is called linking R in phonology in connected speech, aspects of connected speech. Here's our answer. Now again we would say okay this phonologically this is one word. So you see we're in deep trouble in defining words. Now let's look at some other languages. So see him. It's your turn now to show to us something about Arabic. So please can you write down the word for book, for two and for two books. Okay. As you can see Arabic is written from right to left. The script is called abyad and it looks great doesn't it? So thank you very much. Here are some additional points which we might want to erase. So this is then book two and two books. Can you pronounce these words to us? Book kitab, two isnan, two books kitaban. Okay let's write this down. So book you said kitab, kitaban. Okay thank you very much. So this is then very interesting. Kitab is certainly one word. Ifnan is one word. But two books then is also one word. Because I cannot really say kitab and then en. This doesn't work. Quite interestingly we have en contained as a part in kitab and also in isnan. So obviously there's some sort of relation which can be analyzed morphologically but not at this stage. Okay so in Arabic two words with a construction which consists of two words in English comes out as one word. Problem. Let's look at Chinese now. So Mao Mao it's your turn. Can you write down first of all the Chinese symbols for book? So here we have book. You see Chinese uses a logographic writing system. Now we have two. Oh I can read that. Two strokes means two doesn't it? Okay and then we have two books. We have two books but three symbols isn't that amazing. Now let's first of all hear how these words are pronounced. Okay so Mao Mao the first book is. Okay let's write that down. Chinese is a tone language so we have to mark tones on top of the vowels. Yeah first tone. That was level. R. She even moved her head down so it must be a falling tone. R. Two books. Liang Ben Shu. Liang. Liang Ben Shu. So this is a fall rise isn't it? Yeah fall rise. Okay so something like this. Thank you very much. Now you see in English two books were two words. In Arabic two books was just one word and in Chinese we have Liang Ben Shu. We have three words. It's interesting that Chinese doesn't use R Ben Shu. I cannot explain this at this point. And Arabic and Chinese sorry. So this means two. This means book. And here we have something which is inserted which means something like a piece of. So sort of piece. The question is is this a word like a word like Shu or R or Liang. Well anyway you see we have a problem. We have a problem in the analysis of words across languages. So what are we going to do with all these findings? Well we would clearly say well first of all it is difficult to define words across languages. We have to decide what a word is for each language individually. So we cannot generalize the whole concept. And then furthermore we cannot incorporate writing. Because as you could see we have three languages. English Arabic and Chinese with three different writing systems. Some are phonologically relevant some aren't. And then there are many languages around the world that do not have writing systems at all. So we must base our decisions on phonology. So what could be an alternative? Well the alternative that linguistics suggests is not to use words as the basic units of analysis but morphs. So whatever we think we have is analyzed into morphs. And the resulting sub branch in linguistics is called morphology. And how morphological analysis is going to be performed? That's a different story.