• Keine Ergebnisse gefunden

Khurshid AHMAD

Department of Computing, University of Surrey, UK

1. Preamble

The two terms in the title of this paper, Dead Cat Bounce and Falling Knife are popularly used in financial reports, especially those concerning the trading of specific financial instruments (an instrument is a super-ordinate term and its instances include currencies (e.g. $, £, ¥), shares, government and private bonds). The value of these instruments is established by the market conditions in which they are traded and are therefore determined, in significant measure, by market sentiment. Dead Cat Bounce, a metaphorical term, refers to a temporary recovery by a market after a prolonged decline or bear market. In most cases the recovery is temporary and the market will continue to fall. The term Falling Knife refers to a stock whose price is in the middle of a big fall from a previous value.

A whole range of metaphors is utilised in writing about or discussing financial matters, for example, the animal metaphors bear and bull markets; or health metaphors such as anaemic currencies / economies; or spatial metaphors – instruments go up, down, instruments crash, find their own level. One can argue that market sentiments are determined, in some measure, by events: a set of happenings that occur within a well-defined spatial confine – a small earthquake in Chile to World War II may be two examples of how inclusive the space can be – and that these happenings cover a certain extent in time – again events may last a few seconds or many years or even millennia.

When we describe an event, we describe the abstract, including ideas, simplifications, aspirations and beliefs, and we describe the concrete, especially objects, people, places. The abstract and the concrete are described in the context of a significant occurrence, happening or phenomenon.

This paper deals with some keywords related to the widely used word event and how one can devise a method that will help in understanding these occurrences when analysing them via news reports and other documents. Because computers are fairly unintelligent devices and are used to analyse the texts, my research method has to be simple. The news reports and documents that interest me most are written in specialist language, the constrained nature of which benefits the method of analysis.

This paper is based on presentations the author has made at two recent workshops. The first was the ‘Event Modelling for Multilingual Document Linking’, LREC 2002 Workshop, Las Palmas, Canary Islands (June 2002). The second was the workshop on ‘Financial News Analysis’, TKE 2002, Nancy, France (August 2002).

1 Attributed to a former British Prime Minister, the late Harold Macmillan, who, when asked what can scupper the best-laid plans of a politician, was reported to have said ‘Events, dear boy’.

The constraints of special language, including the repeated use of certain preferred terms and the use of fewer syntactic structures, and the relatively simple assumptions about the organisation of knowledge within a specialist domain, are exploitable in analysing events and their causes. The assertion that special language relies on ‘simple assumptions’ may appear provocative and/or naïve, but when one looks closely at any 20th century description of most specialist enterprises, ranging from relativity theory through to media studies, and from engineering sciences to sociology, politics, and even philosophy, one sees intelligent men and women drawing taxonomies and hierarchies of one type or the other to describe the microscopic world, the entire Universe, kinship and exchange relationships, world trade or family disputes.

The ontological commitments of any specialist community are there for all to see: the specialist endeavour to wrap the world at large or beings in the real world or virtual worlds in taxonomies, equations, diagrams, constants and principles (including the rather pompous universal constants and principles). Another manifestation of ontological commitments within a specialism is its terms. The concrete and abstract that demarcate and distinguish one event from another, or indeed connect one event to another, are articulated in the description of the event through the use of certain terms and through the use of key verbs specific to that specialism.

The other key attribute of the abstract and the concrete is that their referent may be a unique idea, person, place or thing. This reference is articulated as a proper noun.

The arbitrary nature of proper nouns – for example, many of them are given names – makes it difficult to understand their contribution to the description of events.

2. Describing how events are described

Current work in artificial intelligence, a branch of computing, deals with the representation of (specialist) knowledge: representation that requires a set of conventions about how to describe a class of the abstract or concrete. The notion of representation is closely intertwined, with open questions in philosophy and cognate subjects, to the notions of meaning, intention and other open-ended conundra. Schank and colleagues have been at the forefront of this ambitious enterprise and have attempted to present methods for building computer programs that can summarize a collection of sentences, programs to answer questions about the content of the sentences, and perhaps eventually to translate the collection from one language to another (see Hardt (1992) for a review of conceptual dependency).

Schank & Abelson’s (1977) Conceptual Dependency Theory (CDT) was developed as part of a natural language comprehension project and can perhaps be regarded as one of the precursors to the debate on whether event-structure formation contains different structure information or whether this information is part of a more general conceptual or logical semantic representation. CDT has succeeded where many other theories have not quite and has been applied to early virtual reality systems. CDT

can represent action: the staple of virtual reality systems are things moving, objects colliding, people communicating, and objects and people in various states of being.

Schank’s claim was that sentences could be translated into basic concepts expressed as a small set of semantic primitives. Conceptual dependency allows these primitives, which signify meanings, to be combined to represent more complex meanings. Schank calls the meaning propositions underlying language

“conceptualisations”. The conceptualisations can be either active or passive; the former comprise actors, actions, objects, source, destination. The stative conceptualisations, through an arbitrary scale ranging from –10 to +10 can indicate state changes. The stative conceptualisations of health, anticipation, mental and physical states, and awareness have been ‘computed’, that is, a computer program has attempted to infer the ‘meaning’ of an underlying proposition, by interpreting the scales. The statement Bill shot Bob in the heart repeatedly until Bob was no more will be interpreted by CDT as ‘Bob: State → Health ≡ -10’; or John thought Mary found discussions about meaning make her unhappy which the CDT will compute as ‘Mary: State → Mental State ≡ -5’. The world of action-oriented abstracts and concretes is a complex one and CDT approached it bravely. We intend to follow this approach and will attempt to focus on how to infer meaning or intent from examining the lexical content of a sentence or a collection of sentences.

In order to learn about the state of the contents, we have adopted a corpus-based approach: instead of relying on postulates about meaning, encoded as rules of syntax and semantics, we rely on the evidence based almost entirely on the frequency of lexical items. For us, frequency correlates with acceptability. For instance if there is only one instance of John thought Mary found… in a corpus of 100,000 sentences then, statistically, whatever John thought about Mary is in the realm of statistical outliers.

Any inference drawn from outliers has to be heavily qualified. However, if the frequency of the construct John thought Mary… is, say, 1 in 1000 sentences, then it would be safer to infer on the basis of this sample than the one previously discussed.

3. Semantics of Events?

Some authors postulate a distinct and separate level of representation for event structure (Pustejovsky 1991) adopting the view that event structure information concerning time, space and causation has a different status from other kinds of thematic, conceptual or lexical information. Other authors assume that event structure information is part of, or is implicit in, a more general conceptual or logical semantic representation (Jackendoff 1990).

Pustejovsky (2000) has noted that ‘there has been a renewed interest in the explicit modelling of events in the semantics of natural language’. Events in this kind of work ‘are associated with the tensed matrix verb of a sentence and sometimes with event-denoting nominal expressions, such as war and arrival’ (Pustejovsky 2000: 445).

Here the claim is that if we had a well-developed system through which we can process

lexical semantic relations and a good grammatical description of how nouns behave, then we can describe “events as grammatical objects” (Tenny & Pustejovsky 2000).

Lakoff & Johnson (1999) deal at length with causation. For them, states are locations, whereby one can say things like the Japanese economy is out of depression and the US economy is in deep depression. Then there is a discussion of ‘changes’: ‘a change of state [is a] movement from one bounded region in space to another bounded region in space’ (2000: 183). The description of such bounded movement involves the verbs and prepositions of motion like go, come, enter, from, to, into and between. The changes could be continuous or graded. So the financial metaphor would be the revitalisation of a company or an economy, i.e., from a state of poor health to a state of relative well-being. This so-called location event-structure metaphor involves forces that are responsible for causes and force movement which affects causation. Note that the notion of force that in our times is related almost exclusively to concrete objects is now being applied to fairly abstract concepts like an economy, or the state of an organisation.

Knowles (1996) has noted that financial journalists are keen on health metaphors – anaemic, ailing, debilitating, fatal, feverish, haemorrhaging – to describe a failing economy, or a falling currency, or a poorly performing bond or a crashing share. And, when a financial instrument is buoyant the journalists appear to celebrate the well-being of the market by using metaphorical terms like immunity, revitalisation, appetite, strength and so on.

The works of Lakoff & Johnson, of Pustejovsky, and the empirical observations of Knowles allow us to build a framework for analysing financial news stories, reports and learned articles. Metaphors will pose considerable challenges to the current systems for information extraction that deal with news stories, reports and learned articles in finance.

4. Events and sentiments: A case study

Our work focuses on movements of financial instruments including weighted indices of national stock exchanges like the UK Financial Times (London) Stock Exchange 100 (top companies) Index, better known as the FTSE and pronounced footsie. We report on some initial work that attempts to compare changes in the FTSE100 with changes in ‘market sentiment’ as expressed in news reports about the UK economy specifically, and reports about Wall Street indices. The latter has a substantial influence on the UK economy. In addition to sophisticated metaphors there are a number of verbs and adjectives that describe the sentiment of traders with respect to the market they trade in. There are fairly literal words that express sentiment, as reported in the news wires, about the markets: financial instruments rise, fall, markets boom, go bust, and there are gains, losses within the markets, economies slow down, suffer downturns, whole industry sectors may be hard pressed. Below are some examples of news that may express good (or positive) sentiment and bad (or negative) sentiment:

Date Time Left context Positive sentiment

Right context 17 Sep 2002 09:54 insurance premiums, no passenger growth in uk trains and fare

17 Sep 2002 11:58 tesco’s growth has slowed from seven percent

18 Sep 2002 09:22 said, with the greatest growth in widebody freighters such as 25 Sep 2002 10:59 but smiths said it expected growth in military aerospace, medical 27 Sep 2002 10:56 see some acceleration in output growth, particularly in services

Date Time Left context Negative sentiment

Right context 02 Sep 2002 15:54 consumer spending intentions fell five points to - 8 03 Sep 2002 16:52 source of future growth, fell for the first time since 05 Sep 2002 13:12 percent, while prudential fell five percent after it said 09 Sep 2002 12:46 in troubled music firm emi fell five percent to 159 - 13 Sep 2002 17:28 consumer sentiment fell for a fourth straight month

These news reports are written in free natural language and we expected, and found, some ‘misleading’ sentences like New Zealand captain Stephen Fleming (79 not out) and debutant opener Lou Vincent (86 not out) reached their half-centuries in an unbeaten 171-run partnership, adding 103 runs without loss for the middle session after resuming at 87 for two. Here the register has changed from finance to sports and this we endeavour to guard against.

Here are the results of an analysis carried out automatically on 980 financial news items published by Reuters daily (except Saturday and Sunday) in November 2002 comprising over 400,000 words. Our system, based on System Quirk, analysed 70 sentiment words, divided equally between positive and negative sentiment expressing words, in this news corpus. The frequency of positive sentiment words (normalised values starting at c. 0.2) are plotted with the closing value of the FTSE 100 for the month of November 2002: The correlation is encouraging:

5. Afterword and work in progress

The above analysis and the concomitant results are of a tentative nature in that work is progressing in four major directions. First, we are looking at how accurately our chosen single sentiment word can convey market sentiment; initial results are encouraging in that few of the 70 sentiment words are used most frequently and that these frequently used sentiment words occur in a restricted set of phrase structures.

Second, we are exploring the use of financial terminology, based on a commercial Web-based financial terminology system (http://www.investorwords.com), for categorising the financial news into various different categories – stocks, currencies, investment banking – with a view of determining sentiments relating to a given instrument. Third, it has been estimated that over 30% of a financial news item comprises proper nouns – names of organisations and people – and an identification of the proper nouns may lead to the attribution of sentiment to (a group of) people or organisations. Fourth, perhaps most importantly, we have described a time series relating to sentiments: usually, a time series is about cardinal numbers related to a concrete entity – temperature, pressure, money supply, units of goods produced – and are measured using instruments of one kind or another or merely counted by human beings. It is true that opinion polls about politicians or political parties, impart sentiments expressed by people, and to quantify the expression of such opinions over time is a novel time-series construct. We are in the process of investigating the extent to which we can assemble a time series on the basis of indirect observations and what it means to correlate such a series with other more quantitative series like the FTSE100.

Ours is a deliberately lexical approach to making inferences from texts. We are guided by major work in knowledge representation, especially the representation of events, and work in semantics and lexical semantics. Computers currently can process lexical information with some success. It can be argued that the more recent developments in the study of language, both special and general language, and statistically-oriented corpus linguistics are entirely dependent on the abilities of computers to store and (repeatedly) search for lexical patterns in large volumes of texts. The study of how events can be described can benefit automatic extraction of information, both theoretically and practically, from a lexical, corpus-based approach.

Acknowledgements

The work was carried out under the partial sponsorship of the EU’s 5th Framework Programme on Information Societies project GIDA (Project No. IST 2000-31123). The research team at the University of Surrey is led by Khurshid Ahmad and includes Saif Ahmed, David Cheng, Tony Chiu, Pensiri Manomaisupat, Paulo C.F. de Oliveira, and Tugba Taskaya. Lee Gillam is the Technical Manager of the GIDA project at Surrey and Matthew Casey is the Research Supervisor.

References

Hardt, S.L. 1992. Conceptual Dependency. In S.S. Shapiro (ed.). Encyclopaedia of Artificial Intelligence: 259-262.

Jackendoff, R. 1990. Semantic Structures. Cambridge: MIT Press.

Knowles, F. 1996. Lexicographical Aspects of Health Metaphors in Financial Texts. In M. Gellerstam et al. (eds.). Euralex’96 Proceedings (Part II): 789-796.

Gothenburg: Göteborg University.

Lakoff, G. and M. Johnson. 1999. Philosophy in the Flesh. New York: Basic Books.

Pustejovsky, J. 1991. The Syntax of Event Structure. Cognition 41: 47-81.

Pustejovsky, J. 2000. Events and the Semantics of Opposition. In C. Tenny and J.

Pustejovsky: 445-482.

Schank, R.C. and R.P. Abelson. 1977. Scripts, Plans, Goals and Understanding.

Hillsdale: Erlbaum.

Tenny, C. and J. Pustejovsky (eds.). 2000. Events as Grammatical Objects. Stanford:

CSLI Publications.

Outline

ÄHNLICHE DOKUMENTE