• Keine Ergebnisse gefunden

Implement Basics Tasklist2DueDate:18.11.2013

N/A
N/A
Protected

Academic year: 2022

Aktie "Implement Basics Tasklist2DueDate:18.11.2013"

Copied!
2
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Institut f¨ur Informatik Prof. Dr. Heiko R¨oglin Magdalena Aretz

Lab Efficient Algorithms for Selected Problems: Design, Analysis and Implementation Winter 2013/2014

Tasklist 2

Due Date: 18.11.2013

Basics

1. Debugging Java in Eclipse:

Go through lessons 1 – 7 of Mark Dexter’sUsing the Debugger tutorial:

http://eclipsetutorial.sourceforge.net/debugger.html.

2. GNU R:

You do not have to understand every single detail of the following material, just pro- gram along with the examples to get a rough feeling about howRis working.

• Get familiar with the R language. Browse through the examples provided in the sectionsBasics,Reading Files and Graphs of R by example by Ajay Shah:

http://www.mayin.org/ajayshah/KB/R/index.html.

• For a deeper understanding of the Rlanguage, have a look at the official manual:

http://cran.r-project.org/doc/manuals/R-intro.html.

Make sure to cover at least the following chapters: 1 – 3, 7, 8 and 12.1.

• Look atA Sample Session of R:

http://cran.r-project.org/doc/manuals/R-intro.html#A-sample-session.

Implement

1. In order to be able to test your code you should download the following datasets from theUCI Repository:iris,glass,wine,cloud,seeds,abalone.

Note: Thecloud data consists of two parts. Store them both separately.

Make sure to store the data into your svn folder in order to make it referenceable by relative paths.

2. Implement and test a method that reads .csv files, parses the content and stores it into a twodimensional array.

Note: Most of the datasets we use contain some dimensions that are not meaningful for clustering (IDs, categorical values etc.) make sure to either delete those by hand or ignore them when reading in the data.

3. Implement at least two different methods for initializing the centers for the k-Means algorithm. For example:

• Take a random subset of the data points,

• select random points from the input space,

1

(2)

• generate a random initial assignment of the points,

• implement the k-means++-Algorithmas described in http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf.

4. Make sure that your implementation of the k-Means method is working by testing it on the different data sets. If you encounter any problems, solve them by using the debugger (as described in Mark Dexter’s tutorial).

5. Implement a toString()-method that outputs the state of all crucial variables after each iteration (e.g. number of steps, current value of the error function, size of the clusters, ...).

6. Make sure your code contains an adequate amount of comments.

7. Implement a method that checks after each iteration of the k-Means method whether there are empty clusters and solves this problem (e.g. by resetting the respective centers to randomly chosen points from the data).

8. If you want to, you can already start implementing a method that scales the data before running k-Means on it. The most common way of doing this is would be to standardize the content of each dimension.

9. R: DownloadStatET, a plugin with which you can run Rin Eclipse. If you encounter any questions or problems combining Eclipse and R, have a look at:

http://www.splusbook.com/RIntro/R_Eclipse_StatET.pdf.

You will probably have to download thejr packages. For this, follow the instructions on:

http://www.walware.de/it/statet/installation.mframe?jump=install-rj-rpkg.

10. R: Experiment with functions inRthat read data and depict the general structure and distribution. For this, get used to working with the following functions:

read.csv,read.table,summary,plot,scatterplot3d,boxplot.

Note: You do not have to check in any R code yet. But be sure you get used to its main functionalities. We will use them later on for the visualization of our experimental outputs.

2

Referenzen

ÄHNLICHE DOKUMENTE

0.3 M HCl by diluting concentrated HCl (Merck supra pure) 0.8 M ammonia by diluting 25% NH4OH (Fluka supra pure) 0.1M H2O2 by diluting 30% H2O2 (Merck supra pure)..

Working Papers are interim reports on work of the International Institute for Applied Systems Analysis and have received only limited review. Views or opinions

The aim of this research is to compare available integration testing tools by the example of Playtech, in order to conclude which tools are most suitable for integration

The goal of this project is to implement a simple debugger for Java programs using the Java Platform Debugger Architecture (JPDA).. It is sufficient to implement a

Principal component analysis (PCA) of the nematode community (family level) as affected by soil type (sand and loam) and different treatments (sieving with 5 mm mesh size and

This interpretation was agreed by the Machinery Working Group at the meeting held on 9-10 November 2016 as a basis for a consistent application of the term ‘interchangeable

We show that for any sample size, any size of the test, and any weights matrix outside a small class of exceptions, there exists a positive measure set of regression spaces such

The cointegration test, shown in Table 9, (see Engle & Granger, 1987; Engle and Yoo, 1987, Table 2), shows that in the two cases with monthly data (models 5 and 6),