Course Topics

Generally speaking, there are two types of outcomes (i.e. response) in statistical analysis: continuous and categorical responses. Linear Models (LM) are one of the most commonly used statistical methods to analyze continuous outcomes. However, many studies in Engineering, Medical Study, Education, etc. involve categorical outcomes. In these cases, Generalized Linear Models (GLM) are a more appropriate choice for analysis.

This short course will introduce the concept, theory, and application of GLM. Moreover, we will discuss some techniques commonly used in categorical data analysis, such as contingency table analysis, measures of association, tests of independence, tests of symmetry. Class demonstrations will be conducted using three real-world data sets listed below. All analysis will be carried out in R (a free statistical software http://cran.rstudio.com) via the RStudio interface (http://www.rstudio.com/ products/rstudio/download).

Example 1:
Researcher A is interested in how variables, including GRE, GPA and prestige of the undergraduate institution, affect admission status into graduate school. (Binary response)
Data set link: www.ats.ucla.edu/stat/data/binary.csv

Example 2:
Researcher B wants to predict the number of awards that a newly admitted student will earn by looking at the type of program in which the student was enrolled (vocational, general or academic) and the score of their final math exam. (Count response)
Data set link: www.ats.ucla.edu/stat/data/poisson_sim.csv

Example 3:
A Physicians’ Health Study Research Group at Harvard Medical School wants to study the relationship between aspirin use (Placebo/Aspirin) and heart attacks (Fatal Attack/Nonfatal Attack/No Attack).

Data are summarized in the table below:

  Mycardial Infarction
Treatment Fatal Attack Nonfatal Attack No Attack
Placebo 18 171 10,845
Aspirin 5 99 10,933

 

LISA Short Course: Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) from LISA on Vimeo.