Entering Data into R
21 Entering Data into R
It is very convenient to use built-in data sets, but at some point
one wants to enter data into the session from outside of
R. However,
there are so many different ways to find data such as on the web, in
a spreadsheet, in a database, in a text file, in the paper.... As
such, there are nearly an equal number of ways to enter in data.
For the authoritative account on how to do this, consult the ``
R Data
Import/Export'' guide from
http://cran.r-project.org
What follows below is a much-shortened summary to illustrate quickly
several different methods. Which method is best depends
upon the context. Here, we will show you a variety of them and explain
when they make sense to use.
The
c operator combines values. One of its simplest usages is
to combine a sequence of values into a vector of values. For example
> x = c(1,2,3,4)
stores the values 1,2,3,4 into x. This is the easiest way to enter in
data quickly, but suffers if the data set is long.
21.2 using scan
The function
scan at its simplest can do the same as
c.
It saves you having to type the commas though:
> x=scan()
1 2 3
4
Notice, we start typing the numbers in, If we hit the return key
once we continue on a new row, if we hit it twice in a row, scan
stops. This can be fairly convenient when entering in a few data
points (10-40 say), but you might want to use a file if you have
more.
The
scan function has other options, one particularly useful
one is the choice of separator.
21.3 Using scan with a file
If we have our numbers stored in a text file, then
scan can
be used to read them in. You just need to tell
scan to open
the file and read them in. Here are two examples
Suppose the file
ReadWithScan.txt has contents
1 2 3
4
Then the command
> x = scan(file = "ReadWithScan.txt")
will read the contents into your
R session.
Now suppose you had some formatting between the numbers you want to
get rid of for example this is now your file
ReadWithScan.txt
1,2,3,
4
then
> x=scan(file = "ReadWithScan.txt",sep=",")
works.
21.4 Editing your data
The
data.entry command will let you edit existing variables
and data frames with a spreadsheet-like interface. The only gotcha is
that variable you want to edit must already be defined. A simple
usage is
> data.entry(x) # x already defined
> data.entry(x=c(NA)) # if x is not defined already
When the window is closed, the values are saved.
The
R command
edit will also open a simple window to
edit data. This will let you edit functions easily. It can be used
for data, but if you try, you'll see why it isn't recommended.
An important caveat, you must remember to store the results of the
edit or they vanish when you are done. For example
> x = edit(x) ### NOT edit(x) alone!
The command
fix will do the same thing but will
automatically store the results.
21.5 Reading in tables of data
If you want to enter multivariate sets of data, you can do any of
the above for each variable. However, it may be more convenient to read
in tables of data at once.
Suppose you data is in tabular form such as this file
ReadWithReadTable.txt.
Age Weight Height Gender
18 150 65 F
21 160 68 M
45 180 65 M
54 205 69 M
Notice the first row supplies column names,the second and following
rows the data. The command
read.table will read this in and
store the results in a data frame
. A data frame
is a
special matrix where all the variables are stored as columns and each
has the same length. (Notice we need to specify that the headers are
there in this case.)
> x =read.table(file="ReadWithReadTable.txt",header=T)
> x[['Gender']] # a factor, it prints the levels
[1] F M M M
Levels: F M
> x[['Age']] # a numeric vector
[1] 18 21 45 54
> x # default print out for a data.frame
Age Weight Height Gender
1 18 150 65 F
2 21 160 68 M
3 45 180 65 M
4 54 205 69 M
Read table treats the variables as numeric or as factors. A factor
is special class to
R and has a special print method. The
"levels" of the factor are displayed after the values are printed.
As well, the internal representation can be a bit surprising.
21.6 Fixed-width fields
Sometimes data comes without breaks. Especially if you interface with
old databases. This data may be of fixed width format (fwf). An
example data set for student information at the College of Staten
Island is of this form (say
student.txt)
123456789MTH 2149872 A 0220002
314159319MTH 2149872 B+ 0220002
271828232MTH 2149872 A- 0220002
The first 9 characters are a student id, then 7 characters for the
class, 4 for the section, 4 for the grade, 2 for the semester and 4
for the year. To read such a file in, we can use the
read.fwf
command. You need to tell it how big the fields are, and optionally
provide names. Here is how the example above could be read in if the
file were titled
student.txt:
> x=read.fwf(file="student.txt",widths=c(9,7,4,4,2,4),
+ col.names=c("id","class","section","grade","sem","year"))
> x
id class section grade sem year
1 123456789 MTH 214 9872 A 2 2000
2 314159319 MTH 214 9872 B+ 2 2000
3 271828232 MTH 214 9872 A- 2 2000
21.7 Spreadsheet data
Alternatively, you may have data from a spreadsheet. The simplest
way to enter this into
R is through a file format that both
applications can talk. Typically, this is CSV format (comma
separated values). First, save the data from the spreadsheet as a
CSV file say
data.csv. Then the
R command
read.csv will read it in as follows
> x=read.csv(file="data.csv")
If you use Windows, there is a developing package
RExcel
which allows you to do much much more with
R and that
spreadsheet. If you use linux, there is a package for interfacing
with the spreadsheet
gnumeric.
21.8 XML, urls
XML or extensible markup language is a file storage format of the
future.
R has support for this but you may need to add the XML package to your
R installation. Many external applications can write in XML format. On
UNIX the gnumeric spreadsheet does so. The Microsoft .NET initiative
does too.
R has a function
url which will allow you to
read in properly formatted web pages as though you were reading them
with
read.table. The syntax is identical, except that when
one specifies the filename, it is replaced with a call to url. For
example, the command might look like
> address="http://www.math.csi.cuny.edu/Statistics/R/R-Notes/sample.txt"
> read.table(file=url(address))
21.9 ``Foreign'' formats
The oddly titled package
foreign allows you to read in
other file formats from popular statistics packages such as SAS,
SPSS, and MINITAB. For example, to read MINITAB portable files the
R command is
read.mtp.
Copyright © John Verzani, 2001-2. All rights reserved.