In We use store data in a or a . In order to use effectively, we need to learn how to store data, and access the data once we have stored it.

storing data

[sec:storing-data]

storing a data set

Suppose we run an experiment and we get a bunch of data. HOw can we store thisin so that we can then manipulate it with the computer rather than our pencil and paper?

As an example, suppose we had data on the age of students in a classroom:

$$21,19,20,26,20,21,21,28,19,19,35,20,21,23$$

To put this into we will store it as a list or . The command to do this is

  ages = [21,19,20,26,20,21,21,28,19,19,35,20,21,23];

What did we do here?

Here are some more examples of storing data:

Suppose we flip a coin 10 times and get sequence of heads and tails as follows:

H, T, T, H, T, H, H, H, T, H

We could store this in in a data vector by arbitrarily saying a Heads is a 1 and a Tails is a 0 giving

>> cointoss = [1,0,0,1,0,1,1,1,0,1]; % 1 for heads, 0 for tails

If we recorded the amount of gasoling we buy at each fillup we might get a sequence of data such as:

10.9, 10.8, 11.3, 13.2, 9.8, 9.7, 6.5 11.4

This goes into a data vector in the same way:

>> gas = [ 10.9, 10.8, 11.3, 13.2, 9.8, 9.7, 6.5 11.4];

We see that storing a given list of data is not too hard. It just takes some typing.

automatically generated lists of numbers

It may be that our data is not random in nature but rather something we can describe mathematically. For example we might wish to store the integers between 1 and 10. This could be done as

>>one_to_ten=[1,2,3,4,5,6,7,8,9,10]; % too much typing

Or this could be achiebed using with the :

>> one_to_ten=1:10;

In general the colon operator has a step size. So the command

>> x = a:h:b;

will assign to the variable x the list of values

a, a+h,a+2*h ... a+k*h

where k is the largest integer with $a+k(h) \leq b$. The difference between successive numbers is h hence the name step-size.

Another way to do a similar thing is to specify the number of values you want between a and b. This is done with the linspace command:

>> x = linspace(a,b,n);         % n numbers between a and b

For those unafraid of algebra, you can show that this is the same as:

>> x = a:(a-b)/(n-1):b;         % same a linspace(a,b,n)

Or as:

>> x = (0:(n-1))*(b-a)/n+a;         % again the same

These two help you generate evenly space data. If you want to generate exponentially spaced data it is also easy:

>> n= 1:5;
>> x = 10.^n;                   % 10^1, 10^2,... 10^5

The only part to remember is you need the “.^” as otherwise would not know how to raise 10 to a list of numbers.

Here is a list of a bunch of different ways to generate non-random data:

>> x = 1:3;                     % easy as 1,2,3
>> x = 10 - (1:10);             % counts down to 0. Need the
                                % parantheses
>> x = 1:2:99;                  % just the odd numbers
>> x = (-1).^(0:9);             % 1,-1,1,-1,...
>> x = 10.^(-(1:5));            % 10^(-1), 10^(-2), ..., 10^(-5)
>> x = sin((1:n)*(2*pi/n));     % sample the sine wave n times

two-dimensional data

To store more complex data, such as a joint distribution, or correlated data, we need to use matrices in place of vectors.

For example, suppose we have data given in a table for height and weight of a basketball team:

Height 72 76 80 81 85
Weight 180 205 245 250 300

We could store this with two different variables:

>> height = [72,76,80,81,85];
>> weight = [180,205,245,250,300];

Or we could combine the data into an array:

>> combined_data = [72,180; 76,205; 80,245; 81,250;85,300]
combined_data =

   72  180
   76  205
   80  245
   81  250
   85  300

The semicolon tells to start a new row. Be careful, you need to have the same number of elements in each row for to understand what you are doing.

This presents the data in a columnar form. To change this around use the operator :

>> combined_data'
ans =

   72   76   80   81   85
  180  205  245  250  300

To summarize:

With these basics, we should be able to use the already defined variables height and weight to define the array containing both:

>> [height',weight']
ans =

   72  180
   76  205
   80  245
   81  250
   85  300

>> [height,weight]
ans =

   72   76   80   81   85  180  205  245  250  300

>> [height;weight]
ans =

   72   76   80   81   85
  180  205  245  250  300

>> [height;weight]'
ans =

   72  180
   76  205
   80  245
   81  250
   85  300

Try to figure out exactly why each statement produced the given output.

Reading data from a file

You may want to read or write data to a file on a diskette, or the hard drive. To do this is easy. But you need to know the commands.

The command saves data to a file. The command loads data from a file.

For example suppose you have data in a vector x:

>> x= [1,2,1,4,2,5,1,3,2,1,4,2]; % some precious data

With save you can put this data into a file. You can give the file the name you wish, and also specify how the format is stored. Your options are: matlab binary (the default), ascii (plain text), tabs (tab seperated), and double (double precision).

Here are some examples:

>> save file.mat x;             % stores x in binary mode into
                                % file.mat incurrent directory
>> save file.txt x -ascii       % stores x as  text file
>> save a:\file.txt x -tab      % store as tab seperated on a floppy
                                % disk
>> save c:\matlab\file.txt x y z - ascii % save 3 variable to hard
                                         % drive

You can load these back in with the load command:

>> load file.mat;               % loads x back in
>> load a:\file.txt;            % stores x in a new variable called
                                % file
>> temp='file.mat'; load(temp); % you can use a variable

accessing data

[sec:accessing-data]

If we are going to store data in a vector or list, then we need to know how to access the data for it to be of any use.

Accessing the entire list is easy. Suppose our list contains the grades on an exam:

>> grades =[78,98,76,85,45,89,84,68,95];

Then to access the entire list we simply have to type the name without the semicolon:

>> grades                       % no semicolon!
grades =

  78  98  76  85  45  89  84  68  95

This is how we can give this data to a function. For example, we might wish to the data:

>> sort_grades = sort(grades);  % sort in increasing order
sort_grades =

  45  68  76  78  84  85  89  95  98

We might want to know what the largest grade is,or the smallest grade. Forthe sorted data these are just the first and last grades. It would be nice to know how to access these. keeps a list as an ordered set of numbers, the first element in the list is ordered with a 1, the $n$th element with an n. So

>> sort_grades(1);              % first element is 45
>> sort_grades(2);              % returns second element or 68
>> sort_grades(length(sort_grades)); % returns the last element 98.

The last line used the command to get the last element. If the list has lenght $n$, then there are $n$ entries, and so the construct returns the last one.

We can change an individual grade. Suppose we mistyped a grade: the 76 shlud have beenan 86. To change the 3rd entry we reference it and then define it:

>> grades(3) = 86;              % change the third element.

We would still have to reevalute the sort line to update the variable sort_grades.

Tricky MATLAB operations

[sec:tricks-data]