R : Copyright 2001, The R Development Core Team Version 1.4.0 (2001-12-19) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type `license()' or `licence()' for distribution details. R is a collaborative project with many contributors. Type `contributors()' for more information. Type `demo()' for some demos, `help()' for on-line help, or `help.start()' for a HTML browser interface to help. Type `q()' to quit R. [Previously saved workspace restored] >The > is called the prompt. In what follows below it is not typed, but is used to indicate where you are to type if you follow the examples. If a command is too long to fit on a line, a + is used for the continuation prompt.
2 3 0 3 1 0 0 1To enter this into an R session we do so with
> typos = c(2,3,0,3,1,0,0,1) > typos [1] 2 3 0 3 1 0 0 1Notice a few things
> mean(typos) [1] 1.25As well, we could call the median, or var to find the median or sample variance. The syntax is the same -- the function name followed by parentheses to contain the argument(s):
> median(typos) [1] 1 > var(typos) [1] 1.642857
> typos.draft1 = c(2,3,0,3,1,0,0,1) > typos.draft2 = c(0,3,0,3,1,0,0,1)That is, the two typos on the first page were fixed. Notice the two different variable names. Unlike many other languages, the period is only used as punctuation. You can't use an
_
(underscore) to
punctuate names as you might in other programming languages so it is
quite useful. 1> typos.draft1 = c(2,3,0,3,1,0,0,1) > typos.draft2 = typos.draft1 # make a copy > typos.draft2[1] = 0 # assign the first page 0 typosNow notice a few things. First, the comment character, #, is used to make comments. Basically anything after the comment character is ignored (by R, hopefully not the reader). More importantly, the assignment to the first entry in the vector typos.draft2 is done by referencing the first entry in the vector. This is done with square brackets []. It is important to keep this in mind: parentheses () are for functions, and square brackets [] are for vectors (and later arrays and lists). In particular, we have the following values currently in typos.draft2
> typos.draft2 # print out the value [1] 0 3 0 3 1 0 0 1 > typos.draft2[2] # print 2nd pages' value [1] 3 > typos.draft2[4] # 4th page [1] 3 > typos.draft2[-4] # all but the 4th page [1] 0 3 0 1 0 0 1 > typos.draft2[c(1,2,3)] # fancy, print 1st, 2nd and 3rd. [1] 0 3 0Notice negative indices give everything except these indices. The last example is very important. You can take more than one value at a time by using another vector of index numbers. This is called slicing.
> max(typos.draft2) # what are worst pages? [1] 3 # 3 typos per page > typos.draft2 == 3 # Where are they? [1] FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSENotice, the usage of double equals signs (==). This tests all the values of typos.draft2 to see if they are equal to 3. The 2nd and 4th answer yes (TRUE) the others no.
> which(typos.draft2 == 3) [1] 2 4Now, what if you didn't think of the command which? You are not out of luck -- but you will need to work harder. The basic idea is to create a new vector 1 2 3 ... keeping track of the page numbers, and then slicing off just the ones for which typos.draft2==3:
> n = length(typos.draft2) # how many pages > pages = 1:n # how we get the page numbers > pages # pages is simply 1 to number of pages [1] 1 2 3 4 5 6 7 8 > pages[typos.draft2 == 3] # logical extraction. Very useful [1] 2 4To create the vector 1 2 3 ... we used the simple : colon operator. We could have typed this in, but this is a useful thing to know. The command a:b is simply a, a+1, a+2, ..., b if a,b are integers and intuitively defined if not. A more general R function is seq() which is a bit more typing. Try ?seq to see it's options. To produce the above try seq(a,b,1).
> (1:length(typos.draft2))[typos.draft2 == max(typos.draft2)] [1] 2 4This looks awful and is prone to typos and confusion, but does illustrate how things can be combined into short powerful statements. This is an important point. To appreciate the use of R you need to understand how one composes the output of one function or operation with the input of another. In mathematics we call this composition.
> sum(typos.draft2) # How many typos? [1] 8 > sum(typos.draft2>0) # How many pages with typos? [1] 4 > typos.draft1 - typos.draft2 # difference between the two [1] 2 0 0 0 0 0 0 0
45,43,46,48,51,46,50,47,46,45We can again keep track of this with R using a vector:
> x = c(45,43,46,48,51,46,50,47,46,45) > mean(x) # the mean [1] 46.7 > median(x) # the median [1] 46 > max(x) # the maximum or largest value [1] 51 > min(x) # the minimum value [1] 43This illustrates that many interesting functions can be found easily. Let's see how we can do some others. First, lets add the next two weeks worth of data to x. This was
48,49,51,50,49,41,40,38,35,40We can add this several ways.
> x = c(x,48,49,51,50,49) # append values to x > length(x) # how long is x now (it was 10) [1] 15 > x[16] = 41 # add to a specified index > x[17:20] = c(40,38,35,40) # add to many specified indicesNotice, we did three different things to add to a vector. All are useful, so lets explain. First we used the c (combine) operator to combine the previous value of x with the next week's numbers. Then we assigned directly to the 16th index. At the time of the assignment, x had only 15 indices, this automatically created another one. Finally, we assigned to a slice of indices. This latter make some things very simple to do.
> data.entry(x) # Pops up spreadsheet to edit data > x = de(x) # same only, doesn't save changes > x = edit(x) # uses editor to edit x.All are easy to use. The main confusion is that the variable x needs to be defined previously. For example
> data.entry(x) # fails. x not defined Error in de(..., Modes = Modes, Names = Names) : Object "x" not found > data.entry(x=c(NA)) # works, x is defined as we go.
> day = 5; > mean(x[day:(day+4)]) [1] 48The trick is the slice takes out days 5,6,7,8,9
> day:(day+4) [1] 5 6 7 8 9and the mean takes just those values of x.
> cummax(x) # running maximum [1] 45 45 46 48 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 > cummin(x) # running minimum [1] 45 43 43 43 43 43 43 43 43 43 43 43 43 43 43 41 40 38 35 35
74 122 235 111 292 111 211 133 156 79What is the mean, the variance, the standard deviation? Again, R makes these easy to answer:
> whale = c(74, 122, 235, 111, 292, 111, 211, 133, 156, 79) > mean(whale) [1] 152.4 > var(whale) [1] 5113.378 > std(whale) Error: couldn't find function "std" > sqrt(var(whale)) [1] 71.50789 > sqrt( sum( (whale - mean(whale))^2 /(length(whale)-1))) [1] 71.50789Well, almost! First, one needs to remember the names of the functions. In this case mean is easy to guess, var is kind of obvious but less so, std is also kind of obvious, but guess what? It isn't there! So some other things were tried. First, we remember that the standard deviation is the square of the variance. Finally, the last line illustrates that R can almost exactly mimic the mathematical formula for the standard deviation:
SD(X) = |
æ ç ç è |
|
|
(Xi - |
|
)2 |
ö ÷ ÷ ø |
|
. |
> std = function(x) sqrt(var(x)) > std(whale) [1] 71.50789The ease of defining your own functions is a very appealing feature of R we will return to.
> sd(whale) [1] 71.50789
how many elements? length(x) ith element x[2] (i=2) all but ith element x[-2] (i=2) first k elements x[1:5] (k=5) last k elements x[(length(x)-5):length(x)] (k=5) specific elements. x[c(1,3,5)] (First, 3rd and 5th) all greater than some value x[x>3] (the value is 3) bigger than or less than some values x[ x< -2 | x > 2] which indices are largest which(x == max(x))
65311 65624 65908 66219 66499 66821 67145 67447Enter these numbers into R. Use the function diff on the data. What does it give?
> miles = c(65311, 65624, 65908, 66219, 66499, 66821, 67145, 67447) > x = diff(miles)You should see the number of miles between fill-ups. Use the max to find the maximum number of miles between fill-ups, the mean function to find the average number of miles and the min to get the minimum number of miles.
17 16 20 24 22 15 21 15 17 22Enter this into R. Use the function max to find the longest commute time, the function mean to find the average and the function min to find the minimum.
> sum( commutes >= 20)What do you get? What percent of your commutes are less than 17 minutes? How can you answer this with R?
46 33 39 37 46 30 48 32 49 35 30 48Enter this data into a variable called bill. Use the sum command to find the amount you spent this year on the cell phone. What is the smallest amount you spent in a month? What is the largest? How many months was the amount greater than $40? What percentage was this?
9000 9500 9400 9400 10000 9500 10300 10200Use R to find the average value and compare it to Edmund's estimate of $9500. Use R to find the minimum value and the maximum value. Which price would you like to pay?
> x = c(1,3,5,7,9) > y = c(2,3,5,7,11,13)
> x = c(1, 8, 2, 6, 3, 8, 5, 5, 5, 5)Use R to compute the following functions. Note, we use X1 to denote the first element of x (which is 0) etc.