• Keine Ergebnisse gefunden

homework 7

N/A
N/A
Protected

Academic year: 2022

Aktie "homework 7"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Homework 7:

Unsupervised Learning: Kmeans

Benjamin Roth, Marina Sedinkina Symbolische Programmiersprache Due: Thursday December 19, 2019, 16:00

In this exercise you will:

• implement kmeans clustering Exercise 1: Kmeans [7 points]

On the course homepage, you can find the file courses.txt, containing several LMU courses of studies per line. Download it into the data/ folder of your project. Use the NLTK Kmeans clusterer to cluster LMU courses.

Take a look at hw07_kmeans/kmeans.py. In this exercise you will have to implement some methods to perform the clustering.

This homework will be graded using unit tests by running: python3 -m unittest -v hw07_kmeans/test_kmeans.py

Note: Some tests may require more than one of the methods to be implemented before they can pass.

1. Implement the Reader class methodget_lines(self). This method should return the list of courses from the file courses.txt with listed courses, i.e. one course per line.

2. Implement the Reader class methodnormalized_word(self, word). This method should normalize the word by making it lower case and deleting punctuation marks from it. Hint: you can use string.punctuation to access a list of punctuation characters

3. Implement the Reader class methodget_vocabulary(self). This method should return the vocabulary: the list of unique normalized words from the file, sorted al-

phabetically. Note: words in the vocabulary should be normalized, usenormalized_word(self, word) to do this

1

(2)

4. Implement the Reader class method vectorspaced(self,course). This method should represent each course via an one-hot vector: a vector filled with 0s, except for a 1 at the position associated with the word in the vocabulary. Note: the length of vector should be equal to the vocabulary size

5. Implement the KMeans class method distance(self,x,y). This method should calculate the euclidean distance between two given vectors.

6. Implement the KMeans class methodvector_mean(self,vectors). This method should calculate the mean of the vectors. Hint: you can use thenumpylibrary 7. Implement the KMeans class methodclassify(self,input). This method should

assign the cluster to the given vector. To achieve this, first, calculate the euclidean distances between the input vector and the means, and second, return the mean index closest to the input.

8. Once you have implemented all of the missing functionality, have a look at the train(self,vectors) method where you call KMeans’ implemented functions.

Then, you can have a look atrun_kmeans.pyto see how to use Kmeans in practice.

Run the code with:

python3 -m hw07_kmeans.run_kmeans

2

Referenzen

ÄHNLICHE DOKUMENTE

I need to find out the subject of the second panel on the building in the painting in order to see how it relates to the Forge of Vulcan on the right hand panel.

refrigerator frying pan dishes stove cupboard bowl. cup

clotheshorse deck-chair sunshade clothes-peg slide.

Effects of some soil management systems on soil physical properties, microbial biomass and nutrient distribution under rainfed maize production in a humid rainforest

The United States needs to help the Philippines develop its own set of “anti-access/area denial” capabilities to counter China’s growing power projection capabilities..

[r]

Gefordert ist eine Theorie, die zwischen den hermeneutischen Figuren einer vorgän- gigen Schöpfer-Autorität und der Genügsamkeit poststrukturalistischer und diskurs-

The concept of “world literature” introduced by Goethe at the beginning of the 19th century is based on the assumption that there are certain basic conflicts