• Keine Ergebnisse gefunden

r Computer¨ubungzurVorlesungModerneMethodenderDatenanalyseExercise9:DataMiningCup:NeuroBayes Institutf”urexperimentelleKernphysik Fakult¨atf¨urPhysik

N/A
N/A
Protected

Academic year: 2022

Aktie "r Computer¨ubungzurVorlesungModerneMethodenderDatenanalyseExercise9:DataMiningCup:NeuroBayes Institutf”urexperimentelleKernphysik Fakult¨atf¨urPhysik"

Copied!
3
0
0

Wird geladen.... (Jetzt Volltext ansehen)

Volltext

(1)

Fakult¨ at f¨ ur Physik

parbox[t]12.cm

Institut f ”ur experimentelle Kernphysik

Prof. Dr. G. Quast, Prof. Dr. M. Feindt, Dr. A. Zupanc

“Ubungsgruppen: G. Sieber, B. Kronenbitter, A. Heller Ausgabe: 05.07.2012

Computer¨ ubung zur Vorlesung Moderne Methoden der Datenanalyse Exercise 9: Data Mining Cup: NeuroBayes r

This is the last part of the Data Mining Cup exercise. In the last exercise we used a Neural Network for the classification of customers. Everybody who tried to train a Neural Network for this task could realize that it is not an easy task. There is the possibility of overtraining and the problem of not finding the global minimum (keep in mind, that training a Neural Network means, minimizing a loss function in multidimensional space). In this exercise we will continue with Neural Networks and the presentation of techniques which can help to train a Neural Network.

For this purpose we will use the NeuroBayes package which was also used successfully by many diploma students in the Data Mining Cups of the last years. Type the following command to set up the package:

source /home/staff/zupanc/Datenanalyse/DMC/NB/setup neurobayes.sh

or

source /home/staff/zupanc/Datenanalyse/DMC/NB/setup neurobayes.csh for c-shell users. You can find the documentation of the package at $NEUROBAYES/doc/.

(2)

2

• Exercise 9.1:The source files for perfroming a network training and applying an expertise is provided in

/home/staff/zupanc/Datenanalyse/DMC/NB/NBtraining

Copy it in your working directory. The src/nb training.cc sets up the NeuroBayes set- tings and defines which variables are going to be used in the training. Network is filled in src/readInput.cc. To perform the network training compile with (make) and run

./train.bash

Look at the plots in analysis*.gz and try to understand them. To apply an expertise on any given sample run

./expert.bash

The sample on which the expertise is applied is defined insrc/nb expert.cc. This will create a new root file, which is a copy of the original with additional network output variable (called nnout).

• Exercise 9.2:

Use calcScore.C to find out which cut value onnnout variables gives you the best score.

• Exercise 9.3:

A common problem of Neural Networks is the high number of degrees of freedom due to many connections so that statistical noise can be stored. In the worst case, the network can memorize individual events which were used for the training. In order to avoid to train on statistical fluctuations, one can regularize the network (remove insignificant connections).

To switch on regularization, comment out the line containing REG ‘‘OFF’’in

src/nb training.cc. Run the teacher and the expert once more and compare the perfor- mance with the Neural Network without regularization.

(3)

3

• Exercise 9.4:

In the training of a Neural Network with low statistics, overtraining is rather probable. One way how to check, that the network is not overtrained is to split the training sample into N subsamples, training a Neural Network N times with N −1 subsamples and applying the result to the subsample which was not used in the training. With this procedure, called cross validation, one getsN classified samples, where the classified events were not used in the training. InreadInput.ccmodify readinput trainandreadinput expert so that for example evry fifth event is excluded from training but included in expertise. In this way you can make for example 5 tests.

• Exercise 9.5:

Try to improve your network by changing your selection of input variables and/or by using individual preprocessing of them (see manual). This is your chance to play with the Network and make your best classification.

• Exercise 9.6:

Change to the subdirectory of your best network and use the file outputExpertClass *.root

to produce a text file with your prediction for the classification dataset like in the previous exercises. Give the text file to a tutor to obtain a score.

• Exercise 10.9: obligatory

Prepare a short (maximum 10 minutes) presentation, in which you explain at least one of your solutions to your fellow students. You can use the blackboard or prepare slides in electronic format (e.g. with OpenOffice). A beamer will be available for the presentations.

The presentations will take place in the last session and are aScheinkriterium. If you want to use slides in electronic format, save it as a pdf file with nameYourFirstName YourLastName.pdf.

Send this file via email not later than Wednesday, 19.07.2012, to anze.zupanc@kit.edu or put it at a world readable location (e.g. your home directory on the computer pool with access rights set bychmod go+rx ~) and send only the location. Include your matriculation number and your full name in the email.

Referenzen

ÄHNLICHE DOKUMENTE

F¨ur einen gegebenen Impuls einer Spur, sei dE/dx in der Spurkammer im Mittel f¨ur Pionen 1,3 MeV/m und f¨ur Kaonen 1,6 MeV/m.. Die Unsicherheit auf die Messung betrage jeweils

The other files needed for this exercise are provided there as well: A root file containing the training data where it is known whether the customer paid, a root file containing

Take the variables which you used for the cut based approach in the last exercise and calculate the ratio of the probability density functions for good and bad customers P good

To match the market stochasticity we introduce the new market-based price probability measure entirely determined by probabilities of random market time-series of the

We could not estimate the accuracy and precision of our dissolved oxygen data. We did not have enough equipment for an accurate measurement of dissolved oxygen. Since we

Our quantitative findings suggest that migrants prefer a remittance to arrive as cash than as groceries when stakes are high ($400), but not when they are low ($200).. This result

Munich Personal RePEc Archive.. The Road to Market

Because of the good correspondance between ice properties of the green iceberg and the maxine shelf ice under the Ronne Ice Shelf, we conclude that green icebergs axe derived