• Keine Ergebnisse gefunden

Working with Rules

Im Dokument User ’ s Guide (Seite 124-127)

The WordStat Rules editor may be used to define complex coding rules allowing one to specify under which conditions a particular item or category of items should be coded. Such a feature may be useful to differentiate between numerous meanings of a single word (disambiguation). For example, one may limit the coding of the word "bank" to situations where the word refers to the financial institution. This can be done by restricting the coding of "bank" to documents containing vocabulary related to monetary or financial transactions ("cash," "money," "mortgage," "investment," etc.) or by excluding alternate meanings such as when "bank" appears in close proximity to words like "river" or "canoe." Rules may also be used to measure various forms of a phrase. For example, the idiom "TURN OFF” may be expressed in many different ways ("turn it off," "turned off," "turned this off," "turned his radio off"). While figuring out all the possible forms of such an idiom may be very difficult, if not impossible, a single coding rule to look for the word pattern "TURN*" followed by "OFF" within the same sentence could very well cover most of those situations. Rules can also take into account the presence of words that may alter the power of an adjective, such as negations or qualifiers like "rarely," "numerous," "few," etc. Rules may even be used to identify sequences of events or complex actions.

In WordStat, a rule can refer to individual words, word patterns, or phrases, or it may also refer to several items belonging to a content category of the current dictionary. A reference to a category is always preceded by the number or pound ('#') character. For example, in the rule:

SATISFIED NEAR #PROFESSOR

the first item, SATISFIED, refers to a single word while #PROFESSOR will match any item found in the PROFESSOR content category.

Just like words or phrases, rules may be stored anywhere in a categorization dictionary. A rule consists of a target item and from one up to four conditions, each condition consisting of another item linked to the first item using a Boolean (AND, NOT) or a proximity operator (NEAR, BEFORE and AFTER, or their negative forms, NOT NEAR, NOT BEFORE, NOT AFTER). The context in which those conditions will be tested also needs to be specified, allowing one to either consider the content of the entire document or restrict the test to a single paragraph or a single sentence. When a proximity operation is used, one also has to specify the maximum distance in number of words that must separate the two items in order for this proximity condition to be tested as true or false.

To create a rule, click the button and select the RULES menu item. A dialog box similar to the one below will appear:

The minimum requirements for a rule to be valid are:

· A unique name

· At least one statement consisting of a target item, a Boolean or proximity operator and a second item.

· The conditions under which this rule should be tested

· The weight given to items meeting those conditions

· The content category where the rule will be stored be stored

The ampersand ("@") is used as a prefix to denote the presence of a rule and will be added automatically to the rule name. In the above dialog box, the rule item @SATISFIED_PROFESSOR will be stored under the content category POSITIVE_FEELING and will be considered as true if the word SATISFIED occurs in the same paragraph as one of the items in the content category #PROFESSOR and if there is no item in the

#NEGATIONS category in the same sentence and within five words before the target word (i.e.

SATISFIED).

To enter a specific word, word pattern or phase, simply type the desired item. Spaces between words are automatically converted to underscore characters. To enter a content category, type the number or pound ('#') character immediately followed by the name of the category. An existing category may also be selected from a drop-down list by clicking the down arrow located to the right of the edit box and clicking the appropriate category name, listed in alphabetical order.

The following operators may be used in a rule:

RULE CONDITION IS TRUE IF...

item1 AND item2 ...both items occur in the same document, paragraph or sentence.

item1 NOT item2 ...the first item occurs in the document, paragraph or sentence but not the second one.

item1 NEAR item2 ...both items occur in the same document, paragraph or sentence, and are no more than n words apart.

item1 BEFORE item2 ...both items occur in the same document, paragraph or sentence, and the second item appears after the first one within the next n words.

item1 AFTER item2 ...both items occur in the same document, paragraph or sentence and the first item appears after the second one within the next n words.

item1 NOT NEAR item2 ...the first item occurs in a document, paragraph or sentence, and is not found within n words of the second item.

item1 NOT BEFORE item2

...the first item occurs in a document, paragraph or sentence, and is not followed within n words by the second item.

item1 NOT AFTER item2 ...the first item occurs in a document, paragraph or sentence, and does not occur within n words after the second item.

By default, operators are set to <none>. To add an additional criterion, set its operator to a valid Boolean or proximity operator. To remove criteria, set the operator immediately below the last desired criteria to

<none>. When more than one condition is set, you will be asked to specify whether you want to match all criteria or match any one of those.

Once the rule has been properly defined, click the button located in the lower right-hand corner of the dialog box to append the rule definition to the selected content category and to clear the form. Once you have finished entering rules, click the close button to quit this dialog box and return to the WordStat main screen.

Please note that, in order to prevent any recursive or cross-reference problems in rules, content categories can only refer to words, word patterns or phrases stored in categories and will thus ignore the presence of other rules. For example, if a category named #SATISFACTION contains 10 word patterns and three rules, any reference to this category in a rule will take into account those 10 words and will ignore instances where any one of the three rules have been found to be true.

Im Dokument User ’ s Guide (Seite 124-127)