The four lessons of R can be browsed from STA6329 home page Get started with R.

Lesson 1: Basic R: I/O, Array and matrix, conditioning and do loop

R is a free statistics software that is almost identical to Splus. It is available to many operating system including unix and microsoft window. It can be downloaded from the website http://cran.r-project.org/.

In the departmental UNIX system, simply type R, you will get some messages and then a prompt >. Everything starts from there. In the LINUX system, you go from Application => Other => R, you will get a R console and the prompt >. Everything starts from there.

Documents can be found by typing help(). You will get a document windwom. First try An Introduction to R.

Let us begin with the simplest input.
> a<-3.14159
> a
The first line means I define (<- or =) a as 3.14159 and the second "a" means that I want output "a" on the screen. The screen will show
[1] 3.14159
Or you can add descriptions by using cat():
> cat("The value of a is",a,".")
The value of a is 3.14159 .
Instructions can be separated by ; or by a new line. To get out type q() after the prompt. Note that R is case sensitive.

You may define a vector by

>x <- c(1,2,3,4,5,6,7,8)
>x[2]=3.5
>x[6]<- -9
>x
You will see
[1] 1.0 3.5 3.0 4.0 5.0 -9.0 7.0 8.0
Matrix is input by array() or matrix(). Let the x be define as the string just defined. Then
> A <- array(x,dim=c(4,2))  (or matrix(x,4,2))
> A
     [,1]  [,2]
[1,] 1.0     5
[2.] 3.5    -9
[3,] 3.0     7
[4,] 4.0     8
You can change the matrix values by redefining A, e.g., A[3,2]=-7. then the 7 at the position [3,2] will be changed to -7. Another example,
> B = array(1:8,dim=c(2,4)) (or matrix(1:8,2,4))
> B[1,1]=0
> B
     [,1]  [,2] [,3] [,4]
[1,]   0     3    5    7
[2.]   2     4    6    8     
where 1:8 means from 1 to 8 with integer steps, 1,2,...,8. If 0:0 or 0 is used, then the whole matirx contains 0.

Dimension can be higher than 2. For example, w<-array(0:0, dim<-c(3,4,2)) defines a 0 (3,4,2) array. Here is what happens in display.

> w<-array(0:0, dim<-c(3,4,2))
> w[1,2,1]=1;w[3,2,2]=2
> W
, , 1

     [,1] [,2] [,3] [,4]
[1,]    0    1    0    0
[2,]    0    0    0    0
[3,]    0    0    0    0

, , 2

     [,1] [,2] [,3] [,4]
[1,]    0    0    0    0
[2,]    0    0    0    0
[3,]    0    2    0    0

Remove Values

Note that once B is defined as a 2x4 matrix, you may not defined it as any other matrix unless you remove it first. Here is the remove operation.
> rm(A,B,x)
Now A, B and x are removed from the memory.

Read data from files

Data can be read from a string (1-D) or a table (2-D). The following time series data is saved as econ.dat.

Economical data with seasonal and linear trend 1981-1992
JAN  FEB  MAR  APR  MAY  JUN  JUL  AUG  SEP  OCT  NOV  DEC 
254  118  132  129  121  135  148  148  136  119  104  278  
262  126  141  135  125  149  170  170  158  133  114  280  
297  150  178  163  172  178  199  199  184  162  146  305  
316  180  193  181  183  218  230  242  209  191  172  332  
348  196  236  235  229  243  264  272  237  211  180  358  
350  188  235  227  234  264  302  293  259  229  203  371  
397  233  267  269  270  315  364  347  312  274  237  405  
423  277  317  313  318  274  413  405  355  306  271  448  
458  301  356  348  355  422  465  467  404  347  305  471  
501  318  362  348  363  435  491  505  404  359  310  478  
It can be read as a string as follows.
> x<-scan(file="econ.dat",skip=2)   # Read data from econ.dat as x
> x[14]
[1] 1126
> x[120]
[1] 478
Here "#" is the comment symbol for R. This data can also be transferred into a 2-D table, using matrix() and the matrix transpose function, t().
> y<- matrix(x,12,10)
> y[1,3]
[1] 297
> yt <- t(y) > yt[3,4]
[1] 163
Sometime a table may contain more than just numbers. Then it have to be read by special column instructions, "" for characters and 0 for numbers. Suppose we have the following table (saved as time.dat).
 
 T X1  X2  X3   Data for time series
 1 5.5 2.4  T
 2 4.5 2.4  C
 3 5.1 2.4  C
 4 5.5 2.2  T
 5 5.7 2.1  T
 6 5.1 1.5  C
Then it can be read by
> ex1<-scan("time.dat",list(0,0,0,""),skip=1) 
> T<-ex1[[1]]           # T is the first column of ex1.
> X1<-ex1[[2]]; X2<-ex1[[3]]; X3<-ex1[[4]]
> M <- cbind(T,X1,X2)   # column binding function
> M
     T   X1   X2
[1,] 1  5.5  2.4
[2,] 2  4.5  2.4
[3,] 3  5.1  2.4
[4,] 4  5.5  2.2
[5,] 5  5.7  2.1
[6,] 6  5.1  1.5
> M[4,3]
[1] 2.2
> MTM=t(M)%*%M > MTM
       T      X1     X2
T   91.0  110.90  42.70
X1 110.9  165.26  67.96
X2  42.7   67.96  28.78  

where %*% is for matrix multiplication. To get rid of the label and make it a "pure" matrix, use:

> A<-matrix(M,6,3)
    [,1] [,2] [,3]
[1,]  1  5.5  2.4
[2,]  2  4.5  2.4
[3,]  3  5.1  2.4
[4,]  4  5.5  2.2
[5,]  5  5.7  2.1
[6,]  6  5.1  1.5

Output to a file

So far, all the outputs are on the screen. To output results in a file, use sink(). For example,

 
> sink("result.out")
> MTM

There is no MTM output on the screen. It is in the file result.out. To return to screen output use >sink().

For more complex data files, see R Data Import/Export in the Document page.

Arithmetic and matrix operations and functions

The basic arithmetic operators and functions are:

+, -, *, /, ^(exponent), %/% (integer division); 
sqrt(), abs(),sin().cos(), tan(), asin(),acos(),atan(),
exp(), log() (e based) log10()

For matrix operations:

  Operations : multiplication:  %*%  ex:  x %*% y             
  function   : 
    rbind(,,), cbind(,,)     (row, column concatenations)
    svd(x):Singular value decomposition    (p. 28)    
    t(x)  : transpose of matrix x
    diag(x) :  diagonal of a matrix x
    eigen(A) : eigenvalues of a summetric matrix A
       The eigenvectors are given in eigen(A)$values (p. 27)
                         
    solve(x) : inverse of x matrix  or X^{-1} (p.27)
    solve(a, b)  :  solution x of the equations a  %*% x = b
    mean(x), median(x),range(x),var(x), quantile           
    var(x,y) = (covarinace between x and y)
    cov(x,y) =(correlation between x and y)
 

Let us continue with the matrix M and M'M. First find singular value decomposition for M.

> svd.out <- svd(M)
> svd.out$d
[1] 16.477988  3.642783  0.496028

svd.out$u
           [,1]        [,2]        [,3]
[1,] -0.3374511  0.67201955  0.63064903
[2,] -0.3238290  0.32929693 -0.62988892
[3,] -0.3853863  0.17479933 -0.38821643
[4,] -0.4337413 -0.02406623  0.01664167
[5,] -0.4746014 -0.23604000  0.05824250
[6,] -0.4683598 -0.59422587  0.22614290

svd.out$v
           [,1]       [,2]       [,3]
[1,] -0.5497879 -0.8199249 -0.1595506
[2,] -0.7742524  0.4285394  0.4657115
[3,] -0.3134747  0.3795750 -0.8704346
The output means u'Mv=d, or M=udv'. Note the SVD(M) is transferred to svd.out which contains three components d, u and v. The way to retrieve them is to use svd.out$u, etc. Similarly,
> eigen.MTM<-eigen(MTM)
> eigen.MTM
$values
[1] 271.5240859  13.2698704   0.2460437

$vectors
          [,1]       [,2]       [,3]
[1,] 0.5497879  0.8199249  0.1595506
[2,] 0.7742524 -0.4285394 -0.4657115
[3,] 0.3134747 -0.3795750  0.8704346
Note that this eigenvectors are the same as the svd.out$v and the eigenvalues are the squares of the svd.out$d. Can you prove this as a general theorem?

The missing data in R have to be denoted as NA. For example.

> MISSN=c(1,2,3,NA,5,6,7,8,9,10,11,12)
> MISS<-matrix(MISSN,3,4)
> MISS
     [,1] [,2] [,3] [,4]
[1,]    1   NA    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
> NEW<-M%*%MISS > NEW
       [,1] [,2] [,3]  [,4]
  [1,] 19.2   NA 72.6  99.3
  [2,] 18.2   NA 71.6  98.3
  [3,] 20.4   NA 83.4 114.9
  [4,] 21.6   NA 91.8 126.9
  [5,] 22.7   NA 99.5 137.9
  [6,] 20.7   NA 96.3 134.1

Do loop and conditioning

The conditioning and do loop are written as follows (p. 46): (Some commonly used symbols: =, > >=, <, <=, &&(and), || (or).)

> xx<- c(1:2);
> if (M[2,3]>0 && M[1,2]<0) xx[1]=-1 else xx[1]=0;  # && is for "and"
> if (M[2,3]>0 || M[1,2]<0) xx[2]=-1 else xx[2]=0;  # || for "or"
> xx
[1]  0 -1

For Do loops: (for, while):

> w<-matrix(0,3,3)
> for(i in 1:3){for(j in 1:2){w[i,j]=1}}
>w
      [,1] [,2] [,3]
[1,]    1    1    0
[2,]    1    1    0
[3,]    1    1    0

> k=0
> while(k<=1) {k=k+1; w[1,k]=5;w[k,3]=-5}
> w
      [,1] [,2] [,3]
[1,]    5    5   -5
[2,]    1    1   -5
[3,]    1    1    0
Remark: R (or Splus) are not good for debugging. The error message is usually very vague such as

ERROR: syntax error.

One common error is the case. If you write IF(...), it is an syntax error. It should be if(...).

Return to R home page