CAVEAT: This is just a sketch of what it should be. If you would like see something appearing here, please let me know verzani@math.csi.cuny.edu.
This section simply records some examples. Eventually, they will be merged into the main body of the text.
Example: coin tossing
Often we will want to simulate coin tossing. Numerically we need to
generate one of two numbers with equal probability. This does it for
0 and 1. How would you modify it so that you generated -1 and
1?
>> x = rand(1) % 1 random number in [0,1]. Uniform >> x = floor(x+.5) % [0,.5) -> 0 and [.5,1] -> 1
That's it. To do lots at a time, the rand function can be given an argument:
>> x = rand(1,100) % 100 random numbers in a row >> x = floor(x+.5) % the same
Example: average coin tossing
The law of large numbers asserts that as the number of trials
tends to infinity the average will tend to the true probability. More
formally if X1, X2,... is a sequence of i.i.d. events then the
average number of times Xi is in an event A will tend towards the
probability of A. In symbols
|
If you assign heads to be a 1 tails to be a 0, then a sequence of coin tosses looks like this
|
|
cumsum([0,1,1,0,0,0,1,0,1,1]) = 0,1,2,2,2,2,3,3,4,5
Here we go:
>> n = 100; % number of random coin tosses >> x = rand(1,n); % x is in [0,1] >> x = floor(x+.5); % x is in {0,1} >> y = cumsum(x); % the numerator >> z = y ./ (1:n); % the denominator >> plot(z) % Should be converging to 0.5
The graphs you generate will be different each time, but they should all be converging to 0.5 by the end. The amount of difference is controlled by the central limit theorem
Example: Histograms and Empirical Distributions
In real life, we are presented with discrete data. For example the averaged daily temperature for a month may look like this:
|
Basically a histogram is a bar plot where the heights of the bars have a special meaning. First off, we need to set up bins. These are simply intervals that lie on the x axis.
The area of a bar is proportional to the frequency of times the observation is in the bin corresponding to the bar.
The proportion is fixed by demanding that the total area be 1. (Well not quite as we'll see). So it is easy to find the height. If there are n observations, and the bin is the interval (a,b] then the height of the bar above b would be given by the formula
|
For the example above if we wanted a bin at (65,70] then the height would be
|
How can we get MATLAB to do this for us? Well MATLAB has a command hist that will do this automatically. Its output is given in this picture
As an example though we will write our own. These commands could easily be turned into a m-file.
>> x = [67,71,65,70,74,69,68,74,74,66,72,70,70,69,74,73,73,67,71,72,68,74,72,67,74,72,74,69,68,70] % our data vector >> a = [60,65,70,75] % our bins. >> n = length(x);k=length(a); % lengths of vectors >> hold on % for plotting line by line >> for i=1:(k-1) % a loop >> hist(i) = (1/(n*(a(i+1)-a(i)))*(sum(x > a(i)) - sum(x>a(i+1))); % This last line uses fancy stuff >> plot([a(i),a(i),a(i+1),a(i+1)],[0,hist(i),hist(i),0]); % plot bar >> end % end for loopThe whole point of writing this is to show the use of comparing a vector to a value to count the number of times something happens.
Notice what is returned with
>> x > 70 Columns 1 through 26: 0 1 0 0 1 0 0 1 1 0 1 0 0 0 1 1 1 0 1 1 0 1 1 0 1 1 Columns 27 through 30: 1 0 0 0
That is a 0 when that value of x is not greater than 70 and a 1 when it is. If we sum this we count the number of times x is bigger than 70. Neat!
Notice we said that the area adds to 1. Well this isn't quite the case
if the bins aren't chosen big enough to count all the data.
Exercise: This data is the daily close of the stock market for two weeks
|
Empirical Distributions. The histogram is a way of representing graphically something called an empirical distribution. This is a fancy name for a frequency of times the observation is in some set. Here is a definition
|
Exercise: For the stock market data, Find the following:
In MATLAB We use store data in a vector or a matrix . In order to use MATLAB effectively, we need to learn how to store data, and access the data once we have stored it.
Suppose we run an experiment and we get a bunch of data. HOw can we store thisin MATLAB so that we can then manipulate it with the computer rather than our pencil and paper?
As an example, suppose we had data on the age of students in a classroom:
|
ages = [21,19,20,26,20,21,21,28,19,19,35,20,21,23];
What did we do here?
ages = 21 19 20 26 20 21 21 28 19 19 35 20 21 23
Here are some more examples of storing data:
Suppose we flip a coin 10 times and get sequence of heads and tails as follows:
H, T, T, H, T, H, H, H, T, H
We could store this in in a data vector by arbitrarily saying a Heads is a 1 and a Tails is a 0 giving
>> cointoss = [1,0,0,1,0,1,1,1,0,1]; % 1 for heads, 0 for tails
If we recorded the amount of gasoling we buy at each fillup we might get a sequence of data such as:
10.9, 10.8, 11.3, 13.2, 9.8, 9.7, 6.5 11.4
This goes into a data vector in the same way:
>> gas = [ 10.9, 10.8, 11.3, 13.2, 9.8, 9.7, 6.5 11.4];
We see that storing a given list of data is not too hard. It just takes some typing.
It may be that our data is not random in nature but rather something we can describe mathematically. For example we might wish to store the integers between 1 and 10. This could be done as
>>one_to_ten=[1,2,3,4,5,6,7,8,9,10]; % too much typing
Or this could be achiebed using MATLAB with the colon-operator :
>> one_to_ten=1:10;
In general the colon operator has a step size. So the command
>> x = a:h:b;
will assign to the variable x the list of values
a, a+h,a+2*h ... a+k*h
where k is the largest integer with a+k(h) £ b. The difference between successive numbers is h hence the name step-size.
Another way to do a similar thing is to specify the number of values you want between a and b. This is done with the linspace command:
>> x = linspace(a,b,n); % n numbers between a and b
For those unafraid of algebra, you can show that this is the same as:
>> x = a:(a-b)/(n-1):b; % same a linspace(a,b,n)
Or as:
>> x = (0:(n-1))*(b-a)/n+a; % again the same
These two help you generate evenly space data. If you want to generate exponentially spaced data it is also easy:
>> n= 1:5; >> x = 10.^n; % 10^1, 10^2,... 10^5
The only part to remember is you need the ``.^'' as MATLAB otherwise would not know how to raise 10 to a list of numbers.
Here is a list of a bunch of different ways to generate non-random data:
>> x = 1:3; % easy as 1,2,3 >> x = 10 - (1:10); % counts down to 0. Need the % parantheses >> x = 1:2:99; % just the odd numbers >> x = (-1).^(0:9); % 1,-1,1,-1,... >> x = 10.^(-(1:5)); % 10^(-1), 10^(-2), ..., 10^(-5) >> x = sin((1:n)*(2*pi/n)); % sample the sine wave n times
To store more complex data, such as a joint distribution, or correlated data, we need to use matrices in place of vectors.
For example, suppose we have data given in a table for height and weight of a basketball team:
Height | 72 | 76 | 80 | 81 | 85 |
Weight | 180 | 205 | 245 | 250 | 300 |
We could store this with two different variables:
>> height = [72,76,80,81,85]; >> weight = [180,205,245,250,300];
Or we could combine the data into an array:
>> combined_data = [72,180; 76,205; 80,245; 81,250;85,300] combined_data = 72 180 76 205 80 245 81 250 85 300
The semicolon tells MATLAB to start a new row. Be careful, you need to have the same number of elements in each row for MATLAB to understand what you are doing.
This presents the data in a columnar form. To change this around use the transpose operator ':
>> combined_data' ans = 72 76 80 81 85 180 205 245 250 300
To summarize:
With these basics, we should be able to use the already defined variables height and weight to define the array containing both:
>> [height',weight'] ans = 72 180 76 205 80 245 81 250 85 300 >> [height,weight] ans = 72 76 80 81 85 180 205 245 250 300 >> [height;weight] ans = 72 76 80 81 85 180 205 245 250 300 >> [height;weight]' ans = 72 180 76 205 80 245 81 250 85 300
Try to figure out exactly why each statement produced the given output.
You may want to read or write data to a file on a diskette, or the hard drive. To do this is easy. But you need to know the commands.
The save command saves data to a file. The load command loads data from a file.
For example suppose you have data in a vector x:
>> x= [1,2,1,4,2,5,1,3,2,1,4,2]; % some precious data
With save you can put this data into a file. You can give the file the name you wish, and also specify how the format is stored. Your options are: matlab binary (the default), ascii (plain text), tabs (tab seperated), and double (double precision).
Here are some examples:
>> save file.mat x; % stores x in binary mode into % file.mat incurrent directory >> save file.txt x -ascii % stores x as text file >> save a:\file.txt x -tab % store as tab seperated on a floppy % disk >> save c:\matlab\file.txt x y z - ascii % save 3 variable to hard % drive
You can load these back in with the load command:
>> load file.mat; % loads x back in >> load a:\file.txt; % stores x in a new variable called % file >> temp='file.mat'; load(temp); % you can use a variable
If we are going to store data in a vector or list, then we need to know how to access the data for it to be of any use.
Accessing the entire list is easy. Suppose our list contains the grades on an exam:
>> grades =[78,98,76,85,45,89,84,68,95];
Then to access the entire list we simply have to type the name without the semicolon:
>> grades % no semicolon! grades = 78 98 76 85 45 89 84 68 95
This is how we can give this data to a function. For example, we might wish to sort the data:
>> sort_grades = sort(grades); % sort in increasing order sort_grades = 45 68 76 78 84 85 89 95 98
We might want to know what the largest grade is,or the smallest grade. Forthe sorted data these are just the first and last grades. It would be nice to know how to access these. MATLAB keeps a list as an ordered set of numbers, the first element in the list is ordered with a 1, the nth element with an n. So
>> sort_grades(1); % first element is 45 >> sort_grades(2); % returns second element or 68 >> sort_grades(length(sort_grades)); % returns the last element 98.
The last line used the length command to get the last element. If the list has lenght n, then there are n entries, and so the construct returns the last one.
We can change an individual grade. Suppose we mistyped a grade: the 76 shlud have beenan 86. To change the 3rd entry we reference it and then define it:
>> grades(3) = 86; % change the third element.
We would still have to reevalute the sort line to update the variable sort_grades.