• Keine Ergebnisse gefunden

3. HUMAN MEMORY AND INFORMATION PROCESSING

3.4 Long-term memory and learning

3.4.1 Models of semantic memory

LTM. This is an issue that needs to be solved in the future. In the following section, I shall address the role of LTM in learning.

conceivably because the pathway between ‘robin’ and ‘bird’ is shorter than that between ‘robin’ and ‘animal’ in the network. However, evidence to the contrary can also be found. Firstly, people can verify “A robin is a bird” faster than verifying “A penguin is a bird.” The hierarchical model is not able to explain why more typical instances of a category are verified more quickly than the less typical ones. Secondly,

“A chicken is an animal” is verified more quickly than “A chicken is a bird.” The hierarchical model is not able to explain why some levels in the network seem to be more accessible than others. Thirdly, it takes longer to falsify “A bat is a bird” than to falsify “A bat is a plant,” though the pathway between ‘bat’ and ‘bird’ is shorter than that between ‘bat’ and ‘plant’. As a result, the concepts in LTM are not organized in a strictly hierarchical fashion.

Living being

is-a is-a

Animal Plant

is-a is-a

Mammal Bird

is-a is-a is-a is-a is-a Dog Cat Lion Robin Canary

Figure 7: Hierarchical network model (after Douglas & Brian, 1992: 221) An alternative model that attempts to solve the problems just mentioned is the feature comparison model of Smith, Shoben, and Rips (1974). According to this model, a concept consists of a set of semantic features. The essential features of the concept are called defining features while others that are less defining but generally true are called characteristic features. It is assumed that the more similar two instances in a category are, the more characteristic features they share. Although the feature comparison model is able to account for the phenomena that are incompatible with the hierarchical model, a central problem of this model lies in how those features are defined. Moreover, the model postulates two stages of decision processes when judging whether an item (e.g., a robin) belongs to a category (e.g., a bird). The first stage is supposed to roughly compare all the features without considering how defining the features are. The judgment is dependent upon the degree of the feature overlap. If the feature overlap is intermediate, a second stage is required to compare the defining features. In my opinion, it is difficult to define all the semantic features as well as to set up the defining features of a concept. I also believe that it is not

plausible that the defining features are not compared in the first stage. Since they are essential to a concept, they should be included in the first stage of comparison.

Collins and Loftus (1975) pro-posed a refined network model, which is quite flexible and allows for different types of relations between concepts in comparison to the hierarchical model (see Figure 8). In this revised model, there is no strict hierarchy in the network. Instead, memory is determined by the interconnection between the concepts. Apart from the ‘is-a’ links, there are some other types of links such as ‘is-not-a’, ‘can’, and ‘can-not’. On the other hand, the links also differ in terms of ‘the importance’. The most crucial links are supposed to be traversed more quickly, which can explain why the response to the prototype of a category is faster than that to the atypical instances.

Animal

is-a is-a

Mammal Bird

is-a is-a is-a is-not-a has is-a is-a

Wings

Cat Lion has has

Bat Flies Robin Penguin can can has-a

Red Breast can-not

Figure 8: An example of a spreading-activation network model (based on Douglas & Brian, 1992: 229)

Through these alterations, the effect of semantic relatedness, which refers to the prolonged reaction time when determining the relationship between two concepts sharing many common features, is explained by a spreading-activation search mechanism. It assumes that when two concepts are activated, the activation spreads throughout the network until the two initial concepts are linked. There are several intersections where the initial activation meets in the network. Evidence from all these intersections is then summed until a threshold is reached to give a response. It takes more time to falsify the sentence “A bat is a bird” because ‘bat’ and ‘bird’ have some features in common, which leads to more intersections (with positive or negative evidence) they need to be evaluated, and hence, require more time. In addition, spreading activation can account for the priming effect of the lexical decision tasks. When subjects are asked to decide whether the letter strings are

words, the reaction time of a letter string (e.g., doctor) is shorter if a semantically-related word (e.g., nurse) has been shown in the previous trial. This is because the first word has partially activated the concepts related to the second word through its spreading activation, which in turn facilitates the activation of the second word. As we can see, the spreading activation network model is able to explain a wide range of data, and it also allows the use of prior knowledge and the computation of information that was not stored (Douglas and Brian, 1992). Nevertheless, the model has been criticized for being very complex and for neglecting the interaction between the concepts and the real world (Baddeley, 1997).

In the last decade, powerful network models have been based on different principles as the more traditional semantic networks have been developed. These models are termed PDP (parallel distributed processing) or connectionism models (McClelland and Rumelhart, 1986; McClelland et al., 1986). In a connectionist network, the nodes or units are connected by weighted links. The strength of the flow of activation from the given unit to another one is a function of its weight. A simple example of a PDP network is depicted by Figure 9.

Figure 9: An example of a PDP network with a hidden unit (Rumelhart and McClelland, 1986; taken from Baddeley, 1997: 260)

In contrast to semantic network models, a concept in a connectionist network model does not correspond to a particular node but is represented as a pattern of activation over the whole network. That is, concepts are distributed throughout the network, i.e. encoded as a set of connection weights. The advantages of connectionist networks are as follows: 1) The network allows partial inputs to retrieve partial outputs. 2) The network is capable of learning. When an error occurs, the network

can adjust the connection weights to produce the correct output (backward error propagation). 3) The same set of weights can be used for remembering specific information and learning abstractions (Douglas & Brian, 1992). All in all, the connectionist approach is more powerful than the semantic networks approach in modeling human memory and learning because of the parallel distributed processing mechanism. However, the development of connectionism does not ensure progress in our understanding of how the human brain processes information. As a matter of fact, there is a basic problem concerning cognitive adequacy. From an engineering point of view, as long as the network can produce the correct output, it does not matter whether the way in which the network operates is consistent with the way in which the human brain operates. From the viewpoint of cognitive science, however, it most certainly does (Baddeley, 1997). Baddeley (1997: 272) suggested that in the future “it will probably be necessary to blend connectionist approaches with more rule-based models, using the empirical methods of experimental psychology to evaluate and shape such developments.”

3.4.1.2 Schema theories

There is another family of theoretical approaches which follows from the assumption that semantic memory comprises structures that are much more comprehensive than the simple concepts proposed by network models. These approaches suggest that people remember information in terms of existing structures which is termed schema.

According to Bartlett (1932), “A schema is a structure that organizes large amounts of information into a meaningful system… A schema is a stereotype specifying a standard pattern or sequence of steps associated with a particular concept, skill, or event. Schemata are types of plans we learn and use during our environmental interactions” (Schunk, 1996: 168). As I have mentioned in Chapter 2, the study by Bartlett (1932) showed that when people tried to recall the story, they often distorted or ignored the parts of the story that were not compatible with their past experiences.

This indicates that people actively use the schemata stored in their memory to reorganize or reconstruct the events (effort after meaning).

An essential notion of schema was proposed by Piaget (1952). In Piaget’s view, a schema is “a completely coordinated set of physical actions and cognitive functions, a set that worked together in its totality to respond to every perceived experience that could be related to the schema.” (Piaget, 1952: 237). Schemata are assumed to develop only for the situations, events or patterns that occur repeatedly.

Two functions are ascribed to schemata: 1) Assimilation: The new experience is changed to fit the schema and its altered features are then incorporated into the

schema. 2) Accommodation: When assimilation fails, the schema has to adapt itself to accommodate the situation it is trying to assimilate. In the course of learning, new information is checked against a schema, which may be specified, extended or modified to accommodate the new information.

Both Bartlett’s and Piaget’s notions of schema still lack specificity in terms of what schemata exactly contain and how they are exactly developed and structured.

Minsky (1975) suggested that a useful schema theory can only be established if the following issues have been addressed: 1) how people select an appropriate schema from their memory to deal with a given situation; 2) how schemata are interrelated to each other and are retrieved as needed; 3) how a schema is modified and created; and 4) how the memory store changes as a result of learning (cf. Marshall, 1995). None of these issues have been fully addressed to date. However, schema theories play an important role in regard to learning and memory.

For example, schema theories explain why experts can more efficiently acquire new knowledge related to the domain in which they specialized than can novices. The reason for the difference between experts and novices lies in the amount of their prior domain knowledge (or schemata).

“Perhaps the largest source of individual differences in memory performance is difference in knowledge in a particular domain. It is much easier to remember something if we have a framework in which to embed that new knowledge… There is clear evidence that the ability to acquire new facts about some domain depends a great deal on what one already knows.

For example, Spilich, Vesonder, Chiesi and Voss (1979) found that people who knew more about baseball were much better able to remember descriptions of baseball games…. For the facts that were not relevant to the game, the groups showed no difference in recall. Thus, having prior knowledge allows one to understand (and remember) the relevant information better…, this previous knowledge allows one to interpret new information more easily to make it meaningful, to incorporate it into what one already knows, and to retrieve it easily using prior retrieval schemes.” (Douglas & Brian, 1992: 208-209).

Ausubel (1963, 1968, 1977) also pointed out that learning is more effective when new information bears a relation to knowledge in memory. Hence, the amount and the accessibility of prior knowledge in terms of established schemata should influence learning results and learning efficiency.