Latent Semantic Analysis Latent Semantic Analysis
The Predication Model
Performance of the Model
Latent Semantic Analysis (LSA)
• LSA maps text onto high-dimensional vectors
• The claim: similarity of vectors indicates similarity of sense
• The miracle: it works
LSA - What Does It Do?
• LSA gets text (lots of) and creates a cooccurence matrix:
1 2 3 4 ... # Documents
bla 1 2 1
blub 2 3
..
..
# Word Types
1 2 ... n
bla .1 .2 .3
blub .1 .1 .4
..
..
# Word Types
• From this, LSA creates a smaller matrix:
LSA - What Does It Do? (II)
• So, an n-dimensional vector is assigned to each word:
– Direction of a vector: the “meaning”
– Length of a vector: weight (how much is known, or, how important is it)
– Cosine (Angle btw. two vectors): similarity of meaning
• Around n=300, LSA similarity judgements resemble human judgements
Sample Applications
• LSA could be shown to resemble human similarity judgements, e.g., in
– Choosing the most appropriate synonym for a given word
– Word/word or passage/word relations found in priming experiments
– Essay grading (!)
Interpretation of LSA
• Invented for information retrieval
• Suitable as framework for semantic theory?
– Wittgenstein: word meanings are not to be
defined, but can only be characterized by their
“family resemblance”
– That is, their cosine neighborhood?
• Relevance as cognitive architecture?
– Striking similarities to human performance – Abstract neurological plausibility
LSA - What Does It Really Do?
– For n >= min{#Word Types, #Documents}, the result is a perfect reconstruction of the original matrix
– For smaller n, some generalizations have to be made.
– So the generalizations are the effect of a compression process.
1 2 3 4 ... # Documents bla
blub
..
..
# Word Types
=^
bla blub
…
…
# Word Types
1 2 ... n 1
2 ...
n
1 2 3 4 ... # Documents 1
2 ...
n
x x
– Singular Vector Decomposition (SVD)
Composition of LSA Vectors
• Needed: A compositional rule for LSA such that for two vectors P and A, a sensible
vector for P(A) is put out
• One idea: centroid addition
• Too simple: When a predicate is applied to an argument, the predicate meaning is
typically influenced…
Latent Semantic Analysis (LSA) The Predication Model
The Predication Model
Performance of the Model
What is predication?
• “predicate” contains the statement
• Predication Algorithm is a solution to problems LSA cannot cope with
• Core problem: Predicate meaning:
– we predicate only subset of Properties of P (contextually appropriate ones) not all of P
What is predication? (II)
• Alternative Compositional rule to the centroid vector approach
• Essential characteristic:
Strengthens features of the predicate that are appropriate for the argument of the predication
• LSA + C&I = Predication
How does it work?
• Input:
as in LSA word meanings (as vectors)
• Predication performs a
Construction & Integration process
• Output:
as in LSA: compositional vector
representing the meaning of the compound
Construction
• A network is constructed
– Nodes: P, A, and all other items I – 2 sets of links:
• Between the Argument and all other Items
positive; according to relatedness Æ fascilitation
• All Items I interconnected with each other negative weights Æinhibition
Integration
• The network is self-inhibiting
– Competion for activation
– Nodes/items most strongly related to A and P acquire positve activation values
• Two parameters
– k most activated notes
– m computational approximation limiting the neighborhood
Inhibition-network
Ex: The horse ran
P(A) = RAN[HORSE]
What is the meaning of this proposition?
Need the right word sense of RUN
Meaning = Sum of vectors
simplification: dim=2
• P: predicate
• A: argument
• m: computational approximation
(pruning irrelevant items in space)
• k: Most relevant Items for argument
Output:
Centroid of P, A and the k‘s is computed
A simple example for predication
Meaning of collapsed is compared to landmark meanings
Centroid fails where predication yields
intuitively right results Test:
•The bridge collapsed
•The plans collapsed
•The runner collapsed
Closest landmarks:
•Breakdown
•Failure
•Race
Summary Predication
• LSA + C&I = Predication
• By Using C&I
contextual modification is introduced
= characteristic of human comprehension
• Æ Better than the centroid approach
• ? Sufficient for even more complex language processing tasks ?
Latent Semantic Analysis (LSA) The Predication Model
Performance of the Model
Performance of the Model
Performance of the Model in Performance of the Model in
various complex NLP tasks various complex NLP tasks
Metaphor Comprehension Causal Inference
Similarity Judgements
Homonym Treatment
Metaphors
m = 500
Metaphor Comprehension
• Centroid makes no sense
– centroid of lawyer & shark in no man‘s land, more related to shark .83 and fish .58
• Overall the algorithm produced satisfactory results:
– cosine btw. metaphor & relevant landmarks was much higher than w. irrelevant landmarks.
• But still human
– e.g. Her marriage is an icebox.
• What is „cold marriage“ ?
Priming in Metaphor Comp.
• Time to comprehend metaphor is increased when literal meaning is primed.
– e.g. Sharks can swim. My lawyer is a shark.
– Vice Versa
• e.g. My lawyer is a shark. Sharks can swim.
• Some of the major psycholinguistic
phenomena about metaphor comprehension are readily accounted for.
Causal Inferences
m = 20, k = 5
Causal Inference Comp.
• Algorithm produced satisfactory results:
– Sentence vectors, computed by predication, are closer to causally related inferences than to causally unrelated but superficially similar sentence.
• But still failed in the last example, i.e.
– the hunter shot the elk -> the elk was dead – possible reasons:
• smaller k = 3 -> hunter dead 0.69 vs. Elk dead 0.68
• replace elk(LSA knows little) with deer(LSA knows a lot) -> deer dead 0.75 vs. Hunter dead 0.69
Causal vs. Temporal Inferences
-> Causally related sentences had a higher cosine than
temporally related sentences, which demonstrates the ability of the predication model to explain
causal inference (semantic relatedness)
Causal average vs.
Temporal average 0.58 vs. 0.42
Similarity Judgement (SJ)
• SJ do not directly reflect basic semantic relationships but
– are subject to task- and context-dependent influences (see example below)
• The literature on SJ is huge and complex
– it is not clear which phenomena predication can account for
– one systematic comparison with a small dataset is described (see example below)
SJ
predicatingAnatomy vs. Behavior
Anatomy Behaviour hawk vs. chicken 0.61 0.35
hawk vs. tiger 0.14 0.45
bee vs. ant 0.81 0.48
bee vs. Hummingbird 0.40 0.81
Rated similarity for pairs of animal names as a function of 2 instructional condictions (anatomy and behaviour).
Cosines are computed after predicating either „anatomy
“ or „behaviour“ about each animal name.
m = 50, k = 1
Homonyms
m = 50, k = 30
Homonyms
• The LSA vectors for homonymous nouns contain all possible meanings(w. biases for the more
frequent ones)
– cos(lead, metal) = .34 vs. Cos(lead, follow) = .36
• appropriate predicates select fitting meanings from this complex
• result: predication handle multi-meaning words the same way as multi-sense words
Discussion
• LAS not yet a complete semantic theory, promising alternative of lexical semantics
• althought P exceeds C in simple sentences out of large context, practically (easy grading) the results are almost the same
• predication presuppose a syntactic analysis to find out predicate and argument.
• Is this a sufficient model of comprehension for human?