Index.-
Introduction.
Most of the material that we have covered until now had to do with the use of statistical and econometric models applied to real data and real economic issues. We have also analyzed theoretically some of the nice properties of these econometric models. However, econometric models with nice theoretical properties may not behave nicely in real applications. It is for this reason that researchers usually like to study the robustness of econometric models using data artificially created by the researcher for that purpose. For this reason, it is very important for the researcher to have the ability to create data satisfying any type of desired properties. Using this data the researcher can find out if the nice theoretical properties of certain model are empirically relevant.
Very generally, an experiment of the type described above proceeds as follows:
1.- Specify a "true" econometric model. For example, a
standard linear regression model.
2.- Generate a data set that satisfies the restrictions
specified by the "true" model.
3.- Use the data generated in 2 to evaluate empirically
the "true" econometric model.
In order to fulfill step 2 we need to learn how to generate data using SAS. We will proceed by first creating a randomly generated data set, after this we will impose the restrictions of the model to this data. We can define a random data set by using a special functions available in SAS and called RANDOM NUMBER GENERATORS.
In the next section we learn how to generate of random
numbers using SAS. Next we use this information in two simple applications.
The first application deals with finding an approximation to the PI number.
The second application consist on using random numbers to construct a "lottery".
In the next set of notes we analyze the properties of the regression model
using the techniques that we have learned here.

The area between the bell shaped curve and the x-axis is equal to one and the area in yellow represents the probability that a value generated from a standard normal random variable falls within the interval [0,1/2]. In particular, because the bell has a symmetric shape we can infer that the probability of obtaining a value higher or equal than zero is equal to 1/2 (equal to the probability of obtaining heads if we toss a coin). If we consider a Standard Normal random variable X and apply a transformation of the form
Y = A + BX
with B>0, we obtain a new normal random variable with mean A and variance B2, it is denoted as Normal(A,B2). Compared with the Standard Normal, this new random variable has a probability density curve centered around A and it has heavier tails if B>1 and thinner tails if B<1. Random numbers are available for a wide variety of random variables. Here are some of the most useful random number generators available in SAS:
x = ranuni(seed); /* uniform between 0 ? 1 */ x = a+(b-a)*ranuni(seed); /* uniform between a ? b */ x = ranbin(seed,n,p); /* binomial size n prob p */ x = rancau(seed); /* cauchy with loc 0 ? scale 1 */ x = a+b*rancau(seed); /* cauchy with loc a ? scale b */ x = ranexp(seed); /* exponential with scale 1 */ x = ranexp(seed) / a; /* exponential with scale a */ x = a-b*log(ranexp(seed)); /* extreme value loc a ? scale b */ x = rangam(seed,a); /* gamma with shape a */ x = b*rangam(seed,a); /* gamma with shape a ? scale b */ x = 2*rangam(seed,a); /* chi-square with d.f. = 2*a */ x = rannor(seed); /* normal with mean 0 ? SD 1 */ x = a+b*rannor(seed); /* normal with mean a ? SD b */ x = ranpoi(seed,a); /* poisson with mean a */ x = rantri(seed,a); /* triangular with peak at a */ x = rantbl(seed,p1,p2,p3); /* random from (1,2,3) with probs */ /* p1,p2,p3 */The Normal and the Uniform random number generators are the ones that we will use more often. In order to understand how random number generators operate imagine that we have a data set containing a very large number of random numbers. A random number generator will extract numbers from that list. The SEED should be specified as an integer and it usually represents the position in the list of the first random generated number. If the seed above is specified as negative or zero the computer clock is used to determine the position of the first random number in the sequence. If the seed is positive (it should be less than 2**31-1) then it will represent the position on the list of the first random generated number. The seed is only examined on the first encounter with a random number generator in your program, so you cannot change the process once you begin. Remember, computer generated random numbers are never truly random.
The following simple program can be used to compute a set of 3 10-dimensional randomly generated vectors from the standard normal distribution.
PROGRAM 1 ===============================================
data a;
array srn(3);
/* Initialize seed. An argument of 0 uses the clock as a seed */
do j=1 to 10;
do i=1 to 3;
srn(i)=rannor(111111); /* generate normal random numbers */
end;
output;
end;
proc print;
/* print the result */
var srn1-srn3;
run;
The goal of this assignment is to use random generated
numbers to design a computer based game that replicates the tossing of
a coin.
The Colorado Lotto is an On-Line "jackpot" game offering the largest prize of any other Lottery game in Colorado. The size of the jackpot is determined by Lotto ticket sales. Lotto involves selecting six numbers from a field of 42 numbers.
How Lotto Works
Players select 6 numbers from a field of 42 possible numbers.
Then, the Lottery chooses 6 winning numbers at random in a live drawing.
If a player matches 3, 4, 5, or 6 winning numbers, they win a prize. Players
may chose their own numbers or use the Quick Pick method in which numbers
are chosen randomly by a computer. The following table indicates the odds
of winning
| Odds of Winning | ||||||||||
|
If you are interested in knowing which are are the most and least frequently drawn numbers drawn in our big jackpot game, Lotto (since Lotto began on January 24, 1989) follow this link. The lotto has been played approximately 13*52 = 676 times since Lotto began on January 24, 1989.
The goal of this application is to use random generated numbers to design a computer based lottery game that replicates the current Colorado Lotto. After this, generate 676 draws of this lottery and obtain the number of times each number appears. Compare these results with those in the actual lotto ( link.). Finally, repeat this exercise many times and show that the empirical probability of drawing a certain number converges to the theoretical probability (1/42 = 0.023809534).
A computer program designed to replicate the Colorado
Lotto should satisfy the following properties:
PROGRAM ===============================================
data a;
array lotto(6);
/* initialize the random generator randomly */
seed = int(1111111*ranuni(0)
+ 1); /* an argument of 0 uses
the clock to generate a seed */
do i = 1 to 6;
/* generate 6 lotto numbers */
lotto(i)
= int(ranuni(seed)*42 + 1) ; /* generate an integer
between 1 and 42 randomly */
end;
proc print;
/* print the result */
title 'results of the lotto';
id lotto1;
var lotto2-lotto6;
run;
===========================================================
After running this program twice we obtained
| 13 | 3 | 9 | 12 | 11 | 4 |
as a result of the first run, and
| 4 | 38 | 41 | 19 | 21 | 37 |
after the second run. The seed is selected by the statement " seed = int(1111111*ranuni(0) + 1); " This statement guarantees that the seed will not be the same each time we run the program.
In the previous program we cannot rule
out the possibility of obtaining repeated numbers. Although the probability
of that event is very small it is positive. The next program incorporates
some additional lines of code to avoid repetition of numbers. Basically,
in the added code we require that the experiment of drawing 6 numbers between
1 and 42 be repeated if two numbers are the same. This can be easily
accomplished using the "do while" statement.
PROGRAM ===============================================
data a;
array lotto(6);
/* initialize the random generator randomly */
seed = int(1111111*ranuni(0)
+ 1); /* an argument of 0 uses the clock
as a seed */
c = 0;
do while (c = 0);
do
i = 1 to 6;
/* generate 6 lotto numbers */
lotto(i) = int(ranuni(seed)*42 + 1) ; /* generate an integer between
1 and 42 randomly */
end;
c = 1; /* this part checks for duplicity in lotto numbers */
do
i = 1 to 5;
do j = (i+1) to 6;
if (lotto(i) = lotto(j)) then c = 0;
end;
end;
end;
proc print;
/* print the result */
title 'results of the lotto';
id lotto1;
var lotto2-lotto6;
run;
===========================================================
Finally, we modify the previous program slightly to obtain 1000 repetitions of the computer LOTTO.
PROGRAM
===============================================
data a;
array lotto(6);
/* initialize the random generator randomly */
seed = int(1111111*ranuni(0)
+ 1); /* an argument
of 0 uses the clock as a seed */
do k=1 to 1000; /* repeat LOTTO 1000 times */
c = 0;
do while (c = 0);
do i =
1 to 6;
/* generate 6 lotto numbers */
lotto(i) = int(ranuni(seed)*42 + 1) ;
/* generate an integer between 1 and 42 randomly */
end;
c = 1; /* this part checks for duplicity in lotto numbers */
do i =
1 to 5;
do j = (i+1) to 6;
if (lotto(i) = lotto(j)) then c = 0;
end;
end;
end;
output;
end;
proc freq;
TITLE 'lotto frequencies';
TABLES lotto1-lotto6 / nocum;
run;
===========================================================
As we know the theoretical probability of drawing a certain number is (1/42 = 0.023809534). After obtaining frequencies for each lotto number we observe that the empirical probability of obtaining a number (as defined by the frequency) is close to the theoretical probability.
PI=3.141592653589793238462643383279502884197
In the following I will describe how to use random number generators to approximate pi. It is a simple method and easy to implement on a computer, as you will see.
![]() |
![]() |
If we pick a random point N times and M of those times the point lies inside the unit circle, the probability of that a random point lies inside the unit circle is equal to
M/N
Consider now the following figure

x2 + y2 = 1
and we deduce that any point in the circle should satisfy
x2 + y2 < 1
Thus, the probability of that a random point (x,y) lies inside the unit circle can be represented as P(x2 + y2 < 1 ) and is equal to

But if N becomes very large (theoretically infinite), the two probabilities will become equal and we can write:

PROGRAM ===============================================
data xycoor;
do i=1 to 1000000;
x = ranuni(111111);
y = ranuni(1111111);
output;
end;
data random;
set xycoor;
if (x*x + y*y) lt 1
then z = 1;
else z = 0;
z = 4*z;
proc means n mean; title ' Approximation to pi '; var z;
run;
===========================================================