• Keine Ergebnisse gefunden

Monitoring and Customizing Substitutions

Im Dokument User ’ s Guide (Seite 135-139)

The substitution process may be used for automatic spelling correction or for lemmatization. Since WordStat does not rely on a prior part-of-speech tagging of words to perform lemmatization, but rather on some suffix substitution rules and lists, some improper word substitutions may occur. In specific situations, substitution may be linguistically conceivable yet semantically invalid. For example, the noun "ground,"

referring to the solid part of the earth's surface, may be erroneously taken as the passive form of the verb

"grind" and be replaced with this infinitive form. WordStat offers a way to monitor all substitutions performed by this substitution routine and to override those deemed necessary by creating a list of custom substitutions, or exceptions. Such a tool may also be used to review and edit previously entered entries in the substitution process.

To review or edit manually defined substitutions:

· Enable the substitution process on the Dictionary page by putting a check mark in the box found to the left of the Substitution list box.

· Apply it to your text collection by moving to the Frequencies page.

· Move back to the Dictionary page.

· Click the to the right of the substitution list to review all substitutions performed. A dialog box similar to this one will appear:

The Substitution Process page of this dialog box provides information about the internal process (if any) involved in the lemmatization or substitution routine as well as a list of all manual substitutions.

To remove a substitution rule:

· Select the rule and click the button.

To add a new substitution rule:

· Click the button. The following dialog box appears:

· Type in the Original edit box the word you would like to replace, then type the replacement word in the Replace with edit box and click OK to create the new substitution rule. This new rule is automatically added to the substitution process.

To cancel a modification:

· All changes are automatically saved to disk. To cancel any change made to the list of manual substitutions during the current WordStat session, click the button. A list of all changes performed in this substitution process will be displayed.

· Select the modification you would like to cancel and click the Undo button.

To review all performed substitutions:

· Move to the Substituted Words page of the dialog box.

The table presents all performed substitutions in alphabetical order as well as other information (such as the substituted word, how frequent the substitution has been made, the length of the original word as well as the inverted version of the original word). Clicking any column header sorts the table in ascending order of

the data in that column. Clicking the same column header a second time will sort its content in descending order.

Because of the algorithm used, lemmatization errors are more likely to occur on shorter words. Sorting on the Length column allows one to focus more specifically on those short words. Also, suffix substitution may be more problematic for some suffixes. For example, lemmatizing words ending with “ING” may introduce confusion between noun, adjective and verb forms. Sorting on the Inverted column allows one to quickly review all substitutions made to words sharing the same ending.

To correct an invalid substitution:

· Select the row with the substitution you would like to override.

· Press the right button of the mouse and select either Keep Invariant to instruct WordStat to keep the word in its original form or Substitute With to specify what word will be substituted with the selected word. When this last option is selected, the following dialog box appears:

· The initial word is automatically entered in the Original edit box. Then type the replacement word and click OK to create the rule. This new substitution rule is automatically added to a list of exceptions and will automatically be accessed when using the currently selected lemmatization routine.

To export the table to disk:

· Click the button. A Save File dialog box will appear.

· In the Save As Type list box, select the file format under which to save the table. The following formats are supported: ASCII file (*.TXT), Tab delimited file (*.TAB), Comma delimited file (*.CSV), MS Word (*.DOC), HTML file (*.HTM; *.HTML), XML files (*.XML) and Excel spreadsheet file (*.XLS), SPSS data files (*.SAV), or STATA data files (*.DTA).

· Type a valid file name with the proper file extension.

· Click the SAVE button.

To append a copy of the table in the Report Manager:

· Click the button. A descriptive title will be provided automatically for the table. To edit this title or to enter a new one, hold down the SHIFT keyboard key while clicking this button (for more information on the Report Manager, see page 191).

To print the table:

· Click the button.

To leave this dialog box

· Click the button. If modifications have been made to the list but have not been saved, you will be prompted whether those modifications need to be saved. Choosing NO will result in the loss of all changes made to the list since you entered this dialog box or since the last time those modifications had been saved. .

Im Dokument User ’ s Guide (Seite 135-139)