In We use store data in a or a . In order to use effectively, we need to learn how to store data, and access the data once we have stored it.
[sec:storing-data]
Suppose we run an experiment and we get a bunch of data. HOw can we store thisin so that we can then manipulate it with the computer rather than our pencil and paper?
As an example, suppose we had data on the age of students in a classroom:
$$21,19,20,26,20,21,21,28,19,19,35,20,21,23$$
To put this into we will store it as a list or . The command to do this is
ages = [21,19,20,26,20,21,21,28,19,19,35,20,21,23];
What did we do here?
The data has been stored in a variable named ages
.
The pair of square braces: “[,]
” indicate to that you are storing a list, or vector of numbers.
The data is stored as a row . because of the commas. If we had used semicolons: ”;” instead, we would have a column vector.
The trailing semi-colon suppresses the printing. Without it would print out something like:
ages =
21 19 20 26 20 21 21 28 19 19 35 20 21 23
The values are stored in an order so that we can access them later. More on this in [sec:accessing-data].
Here are some more examples of storing data:
Suppose we flip a coin 10 times and get sequence of heads and tails as follows:
H, T, T, H, T, H, H, H, T, H
We could store this in in a data vector by arbitrarily saying a Heads is a 1 and a Tails is a 0 giving
>> cointoss = [1,0,0,1,0,1,1,1,0,1]; % 1 for heads, 0 for tails
If we recorded the amount of gasoling we buy at each fillup we might get a sequence of data such as:
10.9, 10.8, 11.3, 13.2, 9.8, 9.7, 6.5 11.4
This goes into a data vector in the same way:
>> gas = [ 10.9, 10.8, 11.3, 13.2, 9.8, 9.7, 6.5 11.4];
We see that storing a given list of data is not too hard. It just takes some typing.
It may be that our data is not random in nature but rather something we can describe mathematically. For example we might wish to store the integers between 1 and 10. This could be done as
>>one_to_ten=[1,2,3,4,5,6,7,8,9,10]; % too much typing
Or this could be achiebed using with the :
>> one_to_ten=1:10;
In general the colon operator has a step size. So the command
>> x = a:h:b;
will assign to the variable x
the list of values
a, a+h,a+2*h ... a+k*h
where k
is the largest integer with $a+k(h) \leq b$. The difference between successive numbers is h
hence the name step-size.
Another way to do a similar thing is to specify the number of values you want between a
and b
. This is done with the linspace command:
>> x = linspace(a,b,n); % n numbers between a and b
For those unafraid of algebra, you can show that this is the same as:
>> x = a:(a-b)/(n-1):b; % same a linspace(a,b,n)
Or as:
>> x = (0:(n-1))*(b-a)/n+a; % again the same
These two help you generate evenly space data. If you want to generate exponentially spaced data it is also easy:
>> n= 1:5;
>> x = 10.^n; % 10^1, 10^2,... 10^5
The only part to remember is you need the “.^
” as otherwise would not know how to raise 10 to a list of numbers.
Here is a list of a bunch of different ways to generate non-random data:
>> x = 1:3; % easy as 1,2,3
>> x = 10 - (1:10); % counts down to 0. Need the
% parantheses
>> x = 1:2:99; % just the odd numbers
>> x = (-1).^(0:9); % 1,-1,1,-1,...
>> x = 10.^(-(1:5)); % 10^(-1), 10^(-2), ..., 10^(-5)
>> x = sin((1:n)*(2*pi/n)); % sample the sine wave n times
To store more complex data, such as a joint distribution, or correlated data, we need to use matrices in place of vectors.
For example, suppose we have data given in a table for height and weight of a basketball team:
Height | 72 | 76 | 80 | 81 | 85 |
---|---|---|---|---|---|
Weight | 180 | 205 | 245 | 250 | 300 |
We could store this with two different variables:
>> height = [72,76,80,81,85];
>> weight = [180,205,245,250,300];
Or we could combine the data into an array:
>> combined_data = [72,180; 76,205; 80,245; 81,250;85,300]
combined_data =
72 180
76 205
80 245
81 250
85 300
The semicolon tells to start a new row. Be careful, you need to have the same number of elements in each row for to understand what you are doing.
This presents the data in a columnar form. To change this around use the operator ’
:
>> combined_data'
ans =
72 76 80 81 85
180 205 245 250 300
To summarize:
A comma, ,
, seperates row elements
A semicolon, ;
, distinguishes rows
the transpose. ’
, switches rows with columns and vice versa
With these basics, we should be able to use the already defined variables height
and weight
to define the array containing both:
>> [height',weight']
ans =
72 180
76 205
80 245
81 250
85 300
>> [height,weight]
ans =
72 76 80 81 85 180 205 245 250 300
>> [height;weight]
ans =
72 76 80 81 85
180 205 245 250 300
>> [height;weight]'
ans =
72 180
76 205
80 245
81 250
85 300
Try to figure out exactly why each statement produced the given output.
You may want to read or write data to a file on a diskette, or the hard drive. To do this is easy. But you need to know the commands.
The command saves data to a file. The command loads data from a file.
For example suppose you have data in a vector x
:
>> x= [1,2,1,4,2,5,1,3,2,1,4,2]; % some precious data
With save
you can put this data into a file. You can give the file the name you wish, and also specify how the format is stored. Your options are: matlab binary (the default), ascii (plain text), tabs (tab seperated), and double (double precision).
Here are some examples:
>> save file.mat x; % stores x in binary mode into
% file.mat incurrent directory
>> save file.txt x -ascii % stores x as text file
>> save a:\file.txt x -tab % store as tab seperated on a floppy
% disk
>> save c:\matlab\file.txt x y z - ascii % save 3 variable to hard
% drive
You can load these back in with the load command:
>> load file.mat; % loads x back in
>> load a:\file.txt; % stores x in a new variable called
% file
>> temp='file.mat'; load(temp); % you can use a variable
[sec:accessing-data]
If we are going to store data in a vector or list, then we need to know how to access the data for it to be of any use.
Accessing the entire list is easy. Suppose our list contains the grades on an exam:
>> grades =[78,98,76,85,45,89,84,68,95];
Then to access the entire list we simply have to type the name without the semicolon:
>> grades % no semicolon!
grades =
78 98 76 85 45 89 84 68 95
This is how we can give this data to a function. For example, we might wish to the data:
>> sort_grades = sort(grades); % sort in increasing order
sort_grades =
45 68 76 78 84 85 89 95 98
We might want to know what the largest grade is,or the smallest grade. Forthe sorted data these are just the first and last grades. It would be nice to know how to access these. keeps a list as an ordered set of numbers, the first element in the list is ordered with a 1, the $n$th element with an n. So
>> sort_grades(1); % first element is 45
>> sort_grades(2); % returns second element or 68
>> sort_grades(length(sort_grades)); % returns the last element 98.
The last line used the command to get the last element. If the list has lenght $n$, then there are $n$ entries, and so the construct returns the last one.
We can change an individual grade. Suppose we mistyped a grade: the 76 shlud have beenan 86. To change the 3rd entry we reference it and then define it:
>> grades(3) = 86; % change the third element.
We would still have to reevalute the sort line to update the variable sort_grades
.
[sec:tricks-data]