### Course Topics

The “Data analysis in SAS” short course provides an introduction to basic data management and statistical analysis procedures using SAS. The course will be conducted in a computer lab and contains a practical session to allow attendees to write and execute SAS codes and also analyze real-world data along with the instructor. The main goal is to enable attendees to use SAS and choose appropriate statistical methods to analyze their data. The intended audience for this course includes researchers who want to learn basis of SAS and use SAS for statistical analysis in their research. Previous exposure to SAS environment is preferable but not necessary.

This course covers:

1. Brief Introduction to SAS
2. Data Manipulation in SAS
3. Basic Summary Statistics
4. Linear regression and ANOVA
5. Logistic regression

Data sets:

Three real-world data sets will be used to demonstrate SAS’s data analysis capabilities:

The first dataset contains 8 variables that show the relationship between public school expenditure and SAT performance. It was extracted from 1997 Digest of Education Statistics, an annual publication of the U.S. Department of Education (www.stat.ucla.edu/labs/datasets/sat.dat).

The second dataset contains consumption of petrol in 48 states in 2011. The relevant variables collected for predicting petrol consumption are the petrol tax, the average income per capita, the number of miles of paved highway, and the proportion of the population with driver's licenses in a certain state. The dataset was obtained at http://people.sc.fsu.edu/~jburkardt/datasets/regression/regression.html. It includes 48 rows, each representing a state, and 7 columns.

We will explore data manipulation methods and get basic descriptive statistics using these two datasets. Then use the second dataset as an example to fit regression models.

The third dataset contains presence/absence of growth of CRA7152 in apple juice as a function of pH (3.5-5.5), Brix (i.e.  Sugar content of an aqueous solution, 11-19), temperature (25-50 °C), and Nisin concentration (0-70), as an example of logistic regression. It was obtained from: W.E.L. Pena, P.R. De Massaguer, A.D.G. Zuniga, and S.H. Saraiva (2011). "Modeling the Growth Limit of Alicyclobacillus Acidoterrestris CRA7152 in Apple Juice: Effect of pH, Brix, Temperature, and Nisin Concentration," Journal of Food Processing and Preservation, Vol. 35, pp. 509-517. (www.stat.ufl.edu/~winner/datasets.html)

Here is an example of correlation plots produced by SAS using petrol consumption data: