• Keine Ergebnisse gefunden

homework 5

N/A
N/A
Protected

Academic year: 2022

Aktie "homework 5"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Homework 5:

Naive Bayes

Dr. Benjamin Roth, Marina Sedinkina Symbolische Programmiersprache Due: Thursday, December 05, 2019, 16:00

In this exercise we will implement a Multi-Class Naive Bayes Classifier that will be trained with the 20 Newsgroup Dataset to disinguish 20 different text categories.

Take a look at the file hw05_naive_bayes/text_categorization.py. In this exercise you will have to complete some methods to make the classification work. Get the code for this exercise from your team git project (usegit pull).

To install sklearn: pip3 install sklearn

To test your code: python3 -m unittest -v hw05_naive_bayes/test_naive_bayes.py

Exercise 1: Creating the instances [0 points]

Complete the methodDataInstance.from_list_of_feature_occurrences(...).

Exercise 2: Constructing/training the Classifier [4 points]

Complete the classmethodNaiveBayesClassifier.for_dataset(cls, dataset, smoothing

= 1.0). To do so, you should be familiar with the python @classmethod idea. The method should serve as a constructor to construct a NaiveBayesClassifier from a Dataset.

Exercise 3: Predicting [6 points]

Complete the methodprediction(self, feature_counts). This method should return the predicted class label (a string). You need to understand the methodlog_probability first.

1

(2)

Exercise 4: Evaluating [4 points]

Complete the method prediction_accuracy(self, dataset). This method should it- erate over a labelledDataset, predict labels for all samples and return the Accuracy.

Exercise 5: Finding the best features [6 points]

Complete the methodlog_odds_for_word(self, word, category)that computes the log-odds log P(category|word)

1−P(category|word)

.

Exercise 6: Using the classifier [bonus]

Once you have implemented all missing functionality, you can have a look attext_categorization.py to see how to use naive bayes in practice. Run the code with:

python3 -m hw05_naive_bayes.text_categorization Info: Download server might be slow.

2

Referenzen

ÄHNLICHE DOKUMENTE

Zufallsereignis x: Wortvorkommen ist ein bestimmtes Wort Zufallsereignis y: Wortvorkommen ist in Email einer bestimmten Kategorie, z.B.. HAM oder SPAM

Multinomial naive Bayes (MNB) is the version of naive Bayes that is commonly used for text categorization problems. In this paper we identify a potential de- ficiency of MNB in

Mediation – How to mediate a text: working with different sources, text types and topics.. by

• The quality of the data allows for analysing highly deformed structures, as found in the shear margins of ice streams. • The 3D horizons illustrate how the ice stream

The organizers of the 11th IEEE International Conference on Automatic Face and Gesture Recognition (IEEE FG 2015) invite interested research groups to participate in the special

Previous experimental research has shown that such models can account for the information processing of dimensionally described and simultaneously presented choice

On the contrary, rapamycin, a macrolide chemically related to FK506 and produced by the bacterium Streptomyces sirolimus, binds to FKBP-Iike FK506, but the FKBPI rapamycin

Print all frames to a local file and copy them onto your computer Select, print and copy all frames that contain the ICMP protocol. Select, print and copy all frames that contain