• Keine Ergebnisse gefunden

many web shops, customers can review products both in text as well as formally using a rating scale. Sites like Amazon also allow for the reviewing of these reviews to assure the quality of the reviews submitted to the site. Readers of a review are asked “Was this review helpful to you?” with the answer choices “Yes” or “No”. These ratings are very explicit and very standardized, as each reviewer is asked to rate the review based on the same quality of the review, namely its helpfulness. Kim’s system than makes use of the explicit content of the review in order to predict its helpfulness with great accuracy. However, such a system is not easily transferable to the domain at hand for a number of reasons. The ratings a Machine Teaching System operates on cannot be assumed to be about the same quality dimension, with helpfulness being merely one of the choices as opposed to being the sole dimension of quality. Additionally, the system presented in [KPCP06] gains much of its performance from the explicit content of a review, such as the numerical rating of the product discussed in the review. Clearly, such information cannot be assumed to be available in all Machine Teaching instances.

The study [KPCP06] does, however, show that such structured data is very useful and should be used whenever possible.

Summary: The related work suggests that the task of rating texts is possible, as it has been done by human-designed systems before, albeit with an inherently limited scope when compared to the aim of the approach presented in this chapter. The rating data underlying a machine learning approach can be assumed to be consistent with the community, albeit the available data on that point is sparse. Lastly, we have briefly dis-cussed other machine learning based approaches operating on similar data to provide us with feature ideas for the task at hand.

The next step in investigating the feasibility of a General Feedback Machine Teach-ing system in the text domain is to present the features developed for the system in Section 3.3 before presenting the evaluation procedure and results in the Sections 3.4 and 3.5

3.3 Feature Engineering

As mentioned earlier, feature engineering is the crucial step to the success of a super-vised machine learning application: While we can resort to an off-the-shelve supersuper-vised machine learning algorithm, feature engineering needs to be done for each application domain separately.

In this section, the feature extraction procedures developed for the Machine Teaching system are described. As introduced in Section 2.4, the input data which consists of unstructured text needs to be converted into vectors. This process is commonly referred to as feature extraction. Designing and implementing features for this task is mainly a manual process, which is guided by prior work and experience and intuition. Thus, feature engineering adds a systematic bias to the machine learning process at large, but one that is believed to aid in the learning task. For the system at hand, feature extractors

from five different classes have been built: Surface, Lexical, Syntactic, Forum specific and Similarity features. The features and their extraction procedure are now described in detail.

3.3.1 Surface Features

The first class of features deals with properties of the text that are extractable on the character level of the posts.

Length: It is hypothesized that the length of a post can have an influence on the quality of it according to community standards. Thus, this feature captures the number of tokens as reported by the tokenizer supplied by the Java SDK.

Question Frequency: The fraction of sentences ending with a question mark “?”. De-pending on the community, the presence or absence of questions and their fre-quency may be indicative of the perceived quality of the post.

Exclamation Frequency: The fraction of sentences ending with and exclamation mark “!”.

Frequent use of exclamation marks is often considered rude in web forums.

Capitalized Word Frequency: The fraction of words spelled all CAPITALIZED. Words spelled like this are commonly associated with shouting in the conversation and are thus indicative of rude behavior.

3.3.2 Lexical Features

This class of features is concerned with the actual wording of the posts.

Spelling Error Frequency: It is commonly agreed that texts with a high fraction of mis-spelled words are considered bad. Thus, this feature detects the percentage of words that are not spelled correctly. In the experiments, the Jazzy spell checking engine5was used together with an English dictionary, as only English texts were analyzed.

Swear Word Frequency: This feature extractor stems from the same line of thought as the extractors for the exclamation frequency and the capital word frequency: Rude-ness in the text might indicate poor quality according to the community stan-dards. Here, rudeness is detected rather directly by determining the percentage of words that are on a list of swear words. The list of swear words was compiled from public resources like WordNet and the Wikipedia. This swear word list con-tains more than eighty words like “asshole”, but also common transcriptions like

“f*ckin” which occur frequently in web forum posts.

5http://jazzy.sourceforge.net

3.3 Feature Engineering

3.3.3 Syntactic Features

This class for feature extractors is concerned with the syntactic level of the texts ana-lyzed. To do so, the texts are annotated with part-of-speech tags as defined in the PENN Treebank tag set, see [MSM94].

To do so, the TreeTagger [Sch95] was used, parametrized by the parameter files for English texts supplied with it. The fraction of each part-of-speech is then stored as one dimension in the resulting feature vector.

3.3.4 Forum Specific Features

The texts analyzed here stem from web forums, a genre of text that exhibits certain features not present in other forms of text. The presence or absence of these specific features may have an influence on the quality of the posts as perceived by the fellow users of the same forum. The following features were extracted:

IsHTML: The users of a forum are usually offered some means to style their posts.

In the case of the Nabble data used below, this was done using standard HTML markup. This feature thus encodes whether or not the author of the post made use of this offering.

IsMail: Nabble also bridges mailing lists into web forums and vice verse. This feature captures the origin of a specific post, namely whether it is originally an email. If not, the post has been entered through the web interface of Nabble.com.

Quote Fraction: When authoring a post, the user may choose to quote another post, e. g. to answer to a specific question raised in that other post. This is often con-sidered good style. However, some posts quote much without an obvious benefit to the post. This feature thus captures the fraction of characters within quotes of a post to allow the machine learning model to capture this property.

Path and URL Counts: In forums where users help one another, a direct pointer to fur-ther information may be considered to be a good aspect of a post. In the experi-ments, two special kinds of pointers are considered: UNIX path names and URLs.

Their number is counted and forms a feature in the feature vector.

3.3.5 Similarity features

Web forums are typically organized by topic. Posts which do not match the topic are called “off topic” and are usually considered to be bad posts. In order to capture the relatedness of a post to the topic it is posted in, the cosine between the word vector of the post and the word vector of the topic is used as an additional feature.

Note that this list of features is just an example of the feature engineering process required in the application of the Machine Teaching Process for general feedback. In every application, the set of features to extract needs to be re-evaluated.

Next steps: The remainder of this Chapter will focus on the evaluation of these fea-tures for the task of rating web forum posts, starting with describing the procedure for doing so in Section 3.4. The following section will then present results from that evaluation and present conclusions thereof.