SVN-revision: 50
Uwe Ziegenhagen
Institut für Statistik and Ökonometrie Humboldt-Universität zu Berlin http://www.uweziegenhagen.de
About this Course
Uwe Ziegenhagen (ziegenhagen@wiwi.hu-berlin.de) consultation hours: upon agreement
80% Exam (90 minutes)
20% Presentation (15-20 minutes) Presentation: practical data analysis Moodle registration is required dates as announced in Moodle
What is R ?
S language, developed by Becker and Chamber in 1984 GNU implementation of S 1992 by R. Ihaka and R. Gentleman in NZ
great variety of packages covering all fields of statistics versions for Win32, Linux/Unix, Mac OS
Course Overview
Introduction R as a calculator
Exploratory Data Analysis Graphics
Regression and Testing ProgrammingR
Emacs and ESS
If: you have no idea what Emacs is, skip this. . .
Else: download ESS, extract to EMACS home directory add lisp code (below) to .emacs
Run R by M-x R
1 (l o a d " c :/ emacs - 2 2 . 1 / ess - 5 . 3 . 7 / l i s p / ess - si t e ") ( s e t q i n f e r i o r -R- program - n a m e " c :/ P r o g r a m m e / R / R
Getting Help. . .
1 h e l p() #
2 h e l p(sin) # h e l p for sin ()
3 ?sin # h e l p for sin
4 h e l p.s t a r t() # H T M L h e l p
5 l i b r a r y() # s h o w i n s t a l l e d l i b r a r i e s
6 l i b r a r y(h e l p ="<p a c k a g e>")
7 h e l p.s e a r c h(" sin ")
Basic R stuff
1 s e t w d(" x :/ myR ") # set w o r k i n g d i r e c t o r y
2 g e t w d() # get w o r k i n g d i r e c t o r y
3 s a v e.i m a g e() # s a v e s w o r k s p a c e to . R d a t a
4 s a v e h i s t o r y() # s a v e c o m m a n d h i s t o r y
5 l o a d(" . R d a t a ") ) # l o a d s w o r k s p a c e
6 l o a d h i s t o r y() # as the n a m e s a y s
7 s o u r c e(" m y f i l e . r ") # r e a d c o m m a n d s f r o m f i l e
8 s i n k(" o u t p u t . txt ") # w r i t e o u t p u t to f i l e s i n k() # o u t p u t to s c r e e n
Installing new packages
(may require administrator rights)
1 # d o w n l o a d and i n s t a l l
2 i n s t a l l.p a c k a g e s(" m u l t t e s t ")
3 # u p d a t e l o c a l p a c k a g e s
4 u p d a t e.p a c k a g e s()
5 # s h o w i n s t a l l e d l i b r a r i e s
6 l i b r a r i e s ()
7 # l o a d l i b r a r y
8 l i b r a r y( name , lib . loc=[ l o c a t i o n ])
9 # l o a d l i b r a r y in f u n c t i o n s
10 r e q u i r e() # r e t u r n T R U E or F A L S E
Customizing R
1 # s h o w g l o b a l o p t i o n s
2 o p t i o n s()
3 # set o p t i o n
4 o p t i o n s(p r o m p t =" : -) ")
5 # get s p e c . o p t i o n
6 g e t O p t i o n(p r o m p t)
Customizing R
Unix/Linux: local.Rprofile Windows: globalRprofile.site
Look & Feel changes can be made inRconsole. Unix/Linux allows different .Rprofile files.
Basic Calculations with R
1 1+2
2 1*2
3 1/2
4 1 -2
5 5 %/% 2 # 2; int d i v i s i o n
6 5 %% 2 # 1; m o d u l o d i v i s i o n
Basic Calculations with R
1 < # s m a l l e r
2 < = # s m a l l e r or e q u a l
3 > # b i g g e r
4 > # b i g g e r or e q u a l
5 != # u n e q u a l
6 = = # e q u a l
7 & # l o g i c a l AND ( v e c t o r )
8 | # l o g i c a l OR ( v e c t o r )
9 && # l o g i c a l AND ( no v e c t o r )
10 || # l o g i c a l OR ( no v e c t o r )
Basic Calculations with R
1 2^2
2 s q r t(2)
3 sin( pi ) # cos , tan
4 a c o s(0) # asin , a t a n a t a n 2
Basic Calculations with R
1 c < - 1:3 * pi
2 c # [1] 3 . 1 4 1 5 9 3 6 . 2 8 3 1 8 5 9 . 4 2 4 7 7 8
3 f l o o r(c) # [1] 3 6 9
4 c e i l i n g(c) # [1] 4 7 10
5 t r u n c( pi ) # 3
6 t r u n c( - pi ) # -3
7 f l o o r( - p ) # -4
8 r o u n d( pi ) # 3
Variable names
sequence always alphabetic => numeric
strings of alphabetic characters: a, b2, abc.de, a1, a1_23 names are case sensitive ’a123’ is not equal to ’A123’
pi is a constant, cannot be used as variable name print(x)prints content ofx
Overview of Objects I
1 # s h o w l o a d e d p a c k a g e s and d a t a
2 s e a r c h()
3 ls(2) # s h o w f u n c t i o n s for s p e c . p a c k a g e
4 ls() # l i s t o b j e c t s
5 o b j e c t s() # l i k e ls ()
6 rm( n a m e ) # r e m o v e s n a m e f r o m w o r k s p a c e
7 rm(l i s t = ls() ) # r e m o v e s all o b j e c t s
R main structures
vectors just vectors of lengthm, one type matrices m×n arrays, one type
dataframes usually read from files, different types
R classes for vectors
useclass(object) for the type character vector of strings
numeric vector of real numbers integer vector of signed integer
logical vector of TRUE or FALSE* complex vector of complex numbers
list vector of R objects
factor sets of labelled observations, pre-defined set of labels NA not available, missing value
R main structures
matrix(’vector’) converts to matrix of m×1 matrix(’vector’, ncol=1) does the same
matrix(’vector’, nrow=1) converts to matrix of 1×n as.data.frame(’matrix’) converts to data frame as.matrix(’data frame’) converts to matrix
as.vector(’matrix’) converts to vector, if matrix has only one
Variables, Vectors and Matrices
1 a < - 2 # d o u b l e n u m b e r a = 2
2 b < - 1:3 # i n t e g e r v e c t o r b = [1 2 3]
3 c < - 1: pi # i n t e g e r v e c t o r c = [1 2 3]
4 d < - c(1 ,2 ,3 ,4) # d o u b l e v e c t o r [1] 1 2 3 4
5 t( d ) # r e t u r n s d as row v e c t o r ( t r a n s p o s e s d )
Variables, Vectors and Matrices
1 a = 1:3
2 b = 2:4
3 c( a , b ) # [1] 1 2 3 2 3 4
4 c(1 ,1:3) # [1] 1 1 2 3
5 seq(1 ,3) # [1] 1 2 3
6 seq(3) # [1] 1 2 3
7 seq(1 ,2 ,by =0 . 1 ) [1] 1.1 1.2 1.3 1.4 1.5 ...
8 seq(1 ,3 ,0.5) # [1] 1.0 1.5 2.0 2.5 3
Variables, Vectors and Matrices
1 a < - l e t t e r s [ 1 : 3 ]
2 a
3 b < - L E T T E R S [ 1 : 3 ]
4 b
5 c < - m o n t h . abb [ 1 : 6 ]
6 c
7 d< - m o n t h . n a m e [ 1 : 1 2 ]
8 d
Variables, Vectors and Matrices
1 > m a t r i x(1:12 , n r o w =3)
2 [ ,1] [ ,2] [ ,3] [ ,4]
3 [1 ,] 1 4 7 10
4 [2 ,] 2 5 8 11
5 [3 ,] 3 6 9 12
6 > m a t r i x(1:12 , n r o w =3 , b y r o w = T )
7 [ ,1] [ ,2] [ ,3] [ ,4]
8 [1 ,] 1 2 3 4
9 [2 ,] 5 6 7 8
10 [3 ,] 9 10 11 12
Variables, Vectors and Matrices
1 > m a t r i x(1:12 , 3 ,4)
2 [ ,1] [ ,2] [ ,3] [ ,4]
3 [1 ,] 1 4 7 10
4 [2 ,] 2 5 8 11
5 [3 ,] 3 6 9 12
Variables, Vectors and Matrices
1 > m a t r i x(0 , n r o w = 5 , n c o l = 5)
2 [ ,1] [ ,2] [ ,3] [ ,4] [ ,5]
3 [1 ,] 0 0 0 0 0
4 [2 ,] 0 0 0 0 0
5 [3 ,] 0 0 0 0 0
6 [4 ,] 0 0 0 0 0
7 [5 ,] 0 0 0 0 0
Variables, Vectors and Matrices
1 # C o n c a t e n a t i o n
2 > x = 1:3
3 > y = 4:6
4 > r b i n d( x , y )
5 [ ,1] [ ,2] [ ,3]
6 x 1 2 3
7 y 4 5 6
8 > c b i n d( x , y )
9 x y
10 [1 ,] 1 4
11 [2 ,] 2 5
Variables, Vectors and Matrices
1 x < - m a t r i x(1:12 , 3 , 4)
2 # e x t r a c t the d i a g o n a l of a m a t r i x
3 x [row( x ) = = col( x ) ]
Variables, Vectors and Matrices
1 > k= m a t r i x(1:10 ,2 ,5)
2 > k
3 [ ,1] [ ,2] [ ,3] [ ,4] [ ,5]
4 [1 ,] 1 3 5 7 9
5 [2 ,] 2 4 6 8 10
6 > k [ 1 : 2 , 3 : 4 ]
7 [ ,1] [ ,2]
8 [1 ,] 5 7
9 [2 ,] 6 8
Variables, Vectors and Matrices
1 d i a g(5) # d i a g o n a l 5 x5 m a t r i x of ’1 ’
2 d i a g(5 ,7 ,8) # 7 x8 m a t r i x w i th 5 on d i a g .
dimension names (matrices and array)
1 > b < - m a t r i x(1:20 ,4 ,5)
2 > d i m n a m e s( b )< - l i s t( l e t t e r s [1:4] , l e t t e r s [ 1 : 5 ] )
3 > b [" b "," b "]
4 [1] 6
Matrix Size
1 dim( x ) # s i z e of m a t r i x x
2 x < - m a t r i x(1:10 ,2 ,5)
3 col( x ) # c o l u m n i n d i c e s of ALL e l e m e n t s
4 row( x ) # row i n d i c e s of ALL e l e m e n t s
5 x [<i>,<j>] # e x t r a c t i - th row and j - th c o l u m n
Inf, NaN
1 x = 1:3
2 is.f i n i t e( x )
3 is.i n f i n i t e( x )
4 Inf
5 NaN
6 is.nan( x )
Sums and Products
1 > x = m a t r i x(1:20 ,4 ,5)
2 > sum( x )
3 [1] 210
4 > p r o d( x )
5 [1] 2 . 4 3 2 9 0 2 e +18
6 m a t r i x(1:10 , n r o w =2) -> a
7 c o l S u m s( a )
8 r o w S u m s( a )
Sums and Products
1 > c u m s u m( 1 : 1 0 )
2 [1] 1 3 6 10 15 21 28 36 45 55
3 > c u m p r o d( 1 : 5 )
4 [1] 1 2 6 24 120
5 [ 1 0 ] 3 6 2 8 8 0 0
6 > c u m m i n(c(3:1 , 2:0 , 4 : 2 ) )
7 [1] 3 2 1 1 1 0 0 0 0
8 > c u m m a x(c(3:1 , 2:0 , 4 : 2 ) )
9 [1] 3 3 3 3 3 3 4 4 4
Sums and Products
1 > a= c(3:1 , 2:0 , 4 : 2 )
2 > a
3 [1] 3 2 1 2 1 0 4 3 2
4 > c u m m i n( a )
5 [1] 3 2 1 1 1 0 0 0 0
6 > c u m m a x( a )
7 [1] 3 3 3 3 3 3 4 4 4
Replacing values
1 > r e p l a c e( x , x<2 ,3)
2 [1] 3 2 3 4 5 6 7 8 9 10
3 > x = 1 : 1 0
4 > x
5 [1] 1 2 3 4 5 6 7 8 9 10
6 > r e p l a c e( x , x<2 ,3)
7 [1] 3 2 3 4 5 6 7 8 9 10
Matrix Calculation
ifx,y are n×mmatrices x +y =x[i,j] +y[i,j] x −y =x[i,j]−y[i,j]
ifx isn×m andy is m×p then x%∗%y =
m
Xx[i,j]·y[j,k]
Matrix Calculation
In expressions involving matrix and vector, the vector is interpreted such that the multiplication works.
If x is vector of lengthm and y is anm×p matrix,x%∗%y is a vector of length p.
If x is an n×m matrix andy is a vector of length m,x%∗%y is a vector of length m.
If x andy are vectors of length m,x%∗%y is a scalar (i.e.
vector of length 1), representing the inner product Pm x[i]∗y[i].
Matrix Inversion
1 # if a is nxn m a t r i x
2 s o l v e( a ) # i n v e r s e of a
3 a ^ -1 # e l e m e n t w i s e i n v e r s e
Lists in R
1 > a< - c(3 ,2 ,1)
2 > b< - c(6 ,5 ,4)
3 > f< - l i s t( a , b )
4 > f [1]
5 [ [ 1 ] ]
6 [1] 3 2 1
7 > f [ a ]
8 [1] 3 2 1
Lists in R
1 > a< - c(3 ,2 ,1)
2 > b< - c(6 ,5 ,4)
3 > f< - l i s t( a , b )
4 > d< - c(" a "," b ")
5 > e< - l i s t( a , b , d )
6 > u n l i s t( f )
7 > u n l i s t( e )
Finding Help
Google (or your other favorite search engine) R-help mailing list
R-announce, R-package, R-devel (for developer-specific topics) R-sig-* special interest groups (r-sig-finance)
Seehttp://r-project.org/mail.html for details
For Further Reading
An Introduction to R, R-intro.pdf R Data Import/Export, R-data.pdf The R Reference Index, fullrefman.pdf Introductory Statistics with R, Peter Dalgaard
Data Analysis and Graphics Using R, Maindonald/Braun R Graphics, Murrell