 This e-lecture is a direct continuation of pre-lexical processing 1, where we looked at speech perception. The main questions, however, are the same, but this time they relate to the written signal. We will, for example, try to identify perceptual cues. We will ask the question, what could perceptual units be, and will eventually model the entire process of perceiving the written input signal. But first of all, we will look at the computer-based version of perceiving the written input signal, namely optical character recognition. Now, an optical character recognition or, in short, OCR system consists of a scanner and some sort of OCR software. The software is installed on the machine where the result of the scan is processed. After a primary visual analysis, which uses some sort of visual recognition device, a visual pattern, eventually some sort of graphological forms, are developed on the basis of pre-stored geometrical visual patterns. But what are these patterns? Well, here you find some variance of the capital character A. Now, what have they in common? Well, research into computer vision has provided evidence that, as in speech perception, the written signal, first of all, undergoes a process of clearance. That is, we must get rid of turbulences, defects in print, and so on. And then we have a visual perception mechanism that looks for specific low-level geometrical cues. At least two such cues can be identified when we analyze the characters of the Roman alphabet. The first of these cues is called the loop. Now, loops are defined as areas of white that are surrounded by black. Each character can thus be defined as having no zero loops, one loop, here it is, or even two loops. The size of the loop area, though, is of subordinate importance. The second visual pattern is referred to as concavity. Now, concavities are defined as concave region facing into a certain direction. Now, the directions are labeled according to the principles of the compass. So we have something like the northern direction, the eastern direction, the western direction, or the southern direction. So, if we look at our characters here, then of course you will find that the capital letter O has no concavity. The letter C has one concavity facing to the east. The capital H has two concavities, one facing to the north, one to the south. And looking at the capital A, we have one concavity facing to the south, and E has two concavities both facing to the east. Now, using these visual patterns, the central question emerges, how do we perceive the written signal? In other words, we're looking for perceptual units. Let us concentrate on languages with a phonographic alphabetic writing system where words are composed of small sets of basic visual elements, namely graphemes. Hence, it is legitimate to ask whether the perceptual analysis of written languages is based in some sense upon character recognition. Such a graphemic or letter mediated approach would be enormously economical, since the recognition device would have to rely on a limited set of perceptual patterns. Alternatively, one could postulate an approach which is holistic in character where elementary visual features are employed to address the lexicon directly. A third view, the transgraphemic view, combines both options, the graphemic and the holistic view. Let us look at these options more closely. The possibility of storing words as graphological structures can only apply to those languages that employ a phonographic writing system and to those speakers of these languages who can write and read, hence to a minority of all cases. Furthermore, if words are stored as graphological structures, then the graphological structure must somehow be related to the phonological structure since phonographic writing systems by nature establish a symbol-sound relationship. However, the graphological specification cannot be accessible during pre-lexical processes since it is part of the lexical specification of an entry itself. A possible solution to this dilemma is the assumption of a processing system where several levels operate in parallel and allow top-down interaction. Now, the holistic view or holistic strategy of perceiving the written signal does not rely on letters as mediators between the visual information and recognition. Rather, it says that the information uptake bypasses the level of letter identification and that the extracted features are directly mapped onto the stored pattern descriptions for individual words. Thus, words are processed in terms of their spatial orientation. Now here, you find five variants, graphemic variants of the word graphemic itself. And of course, we all know that this is the only option that is in accordance with the orthography of present-day English. But don't you have the feeling that sometimes, in order to find out whether a word is spelled correctly or not, you have to see it first. Whenever we are insecure, we often proceed according to the principle, let's write it down first and then we'll decide. Thus, the stored word forms correspond to their perceptual counterparts, which are characterizable on the basis of their sub-segmental properties. The process of written word recognition is then a process identical with general visual perception. The only difference is that the written signal is two-dimensional. Finally, we have transgraphemes or the transgraphemic view. Now, here we are confronted with holistic patterns that expand over several letters. Typical examples of such patterns can be found in print defects where due to the use of specific fonts or different degrees of saturation, two or more adjacent letters touch each other or in those words where we are confronted with morphographemic changes. Now, here is of course a change. Grapheme is spelled with a final e and as soon as we add ik to grapheme, we lose this final e. So, the idea of a transgraphemic representation would be that not only these are the central elements in the mental lexicon, but also something like this as an alternative representation of im. How can all this be modeled? Any theory of character recognition is passive in character. Visual perception, whether by man or by machines, is exclusively sensory, so that there cannot be any relationship between the incoming signal and the visual production mechanism. In speech, two variants of a passive model are discussed, one that makes use of templates and one that is a feature detection model. Yet whole object template theories are usually dismissed as models of visual perception. Templates are incapable of accounting for the vast number of shape and size variations objects exhibit. For these reasons, we are left with models that rely on feature detectors. Now, the heart of a feature detector model is a multi-layered network of recognition devices where evidence for the symbols in the input stream is collected. At the lowest level, here, a system of feature detector units analyzes the fundamental characteristics of the input signal. The higher levels, now these are all higher levels, this information is interpreted in terms of abstract linguistic units. Now these abstract units can be phonemes, graphemes, words, etc. Let us illustrate this using an example. Now here we have a feature detector system which is going to identify the character P on an 8-dot pixel matrix. Let's start the feature detector. Well, first of all, the sans-serif letter P is scanned. Now, at the lowest level of analysis in a feature detector model, geometrical features such as horizontal lines, edge location, pixel position, color saturation, and so on and so forth are analyzed. Let's see, let's look at the result. So here we find a collection of arbitrary feature detectors. Some of them fire, that is, they are positive in terms of their value, for example, B3, 4, 5 and 6 and G3, but the others are not activated. The next level will identify sub-graphemic orthographical features such as loops and concovities, the ones we discussed earlier on. So, for example, some of these arbitrary feature detectors have identified that there is one loop. There is also a concavity facing to the southeast and also activated, perhaps not with such a high value, a concavity facing to the east. Well, this may be an alternative. Now, the evidence collected at this point is then dropped into a feature buffer where it remains for a short interval until further information comes in to combine the features into characters. And so the result will possibly be that we have one loop and one concavity facing to the southeast and the result will be P. The process of feature combination is heavily influenced by higher levels of analysis. The discrimination of P from R, for example, in a word such as play, is phonotactically and graphologically influenced by the fact that initial clusters such as rule do not exist but pull do exist. So this is then information from higher levels. Let's summarize. It seems that the analysis of both speech and written input is sensory and does not involve any active mechanisms and that a system of sub-segmental feature detectors, well, here are our feature detectors, is central to passive theories of perception. In the case of speech, these feature detectors are concerned with specific acoustic cues in the signal. In the case of written language, we have cues such as edges, lines and loops in the written input. Whether there is a cross-modal link, here is the cross-modal link between speech and written language. That is, between graphological and phonological information is not entirely clear.