Statistical Computing with R

Some Web resources

Files in parentheses are local copies.

Getting Started

Note that R, like Unix, is always case sensitive, for example the correct command is "rowMeans" not RowMeans nor rowmeans. It appears that most jobs will be done partially with a script and partially with direct user commands. Running the script is just the same as entering all the commands in the script. All or part of the script can be run at any time. The script might contain the core commands that might be run many times, especially if many options are used or much label data is entered. The command window is used for shorter commands to check the results. See the web resources listed above to find some commands of interest. For help on a command, say the apply command, enter "help(apply)". For help on a general subject, say correlation enter ""). These features and more can also be invoked under the help menu.

An Example Simulation

The initial assignment is to generate 10000 replications of 12 random numbers and examine the means. This is done with 3 commands (which is sort of impressive). The fist command creates a matrix called X with 10000 rows and 12 columns and fills it with standard normal random numbers. The second command takes the mean of each row and places them in vector y. The last command makes a histogram of vector y.
   X <- matrix(data=rnorm(120000), nrow=10000, ncol=12)
   y <- rowMeans(X)

Adding a Custom Function

The next example continues from the first, but adds a custom function (the sum plus 7). The function is then applied to all the rows of matrix X and the output vector z is examined in a histogram (not shown).

   dlgbar <- function(r) {sum(r)+7}
   X <- matrix(data=rnorm(120000), nrow=10000, ncol=12)
   z <- apply(X,1,dlgbar)

Histograms and Quantiles

The following script puts 20,000 standard normal deviates into x and makes a histogram of x. Histogram options suggest about 100 bins and label with density rather than frequency. The breaks argument can be a vector to get the breaks exactly where you want them. Next we then setup some tail probabilities and get those quantiles (percentiles) of x. We then store the histogram structure in hist1 and plot the cumulative density against the bin centers. See help(hist) for more information on options and data structure.

An Example Bootstrap

Here, we first set the sample size (nr) and set up a data vector with 5 arbitrary values. We get its mean and standard deviation. Next we fill a matrix by bootstrapping the 5 data values with replacement. If the number of rows is under 20, we print out the matrix. We then get the means of all the rows and also the mean and standard deviation of these values. It ends with a histogram of the row means (not shown).
  nr<- 1000000
  d <- matrix(data=sample(x,nr*5,replace=TRUE),nrow=nr,ncol=5,byrow=TRUE)
  if(nr<20){d}  #look at d matrix only for small nr


R seems to recommend working with objects rather than loops, but simple loops are simple to do in R. For example, here is a function to calculate a factorial. Of course the built in function "factorial" is probably better and faster.
dlgfac <- function(k) {
  for(k in 1:k){
     value <- value*k
     k <- k-1}

Saving Plots into Files

Most plots will probably be saved using the Windows clipboard. Just right click on the plot and copy it as a metafile or bitmap. You can then paste it into any other windows application. You can also make image files directly in the script, for example: png(file="test.png") plot(x,y,type="b") This makes a png file with the output of the plot command. The png (or bmp or jpeg or tiff) command has several options to control the file.

Some working files

  1. rsamp.pdf
  2. rsimul.pdf
  3. revent.pdf
  4. rsim.pdf
  5. A simple protocol for simulations in R


R seems almost always works and gives an answer, even when things go very wrong. Errors messages and warnings seem rare. For example, data might be silently "recycled" to fill overly large variables. Bad function calls or proceedures leave the existing results unchanged. The program normally "coerces" inappropriate data or variables to make things work. Some of your input is taken only as a "suggestion". While this might seem friendly, it does not necessarily give he correct answer. In many ways R is an very large "black box", which is rarely a good thing. Some extra care and double checking should be used for important projects. Testing the code on small problems and looking at all the results might be helpful.
2010 JAN 28, last updated on 2010 OCT 12
Content support:
Web support: