Data analytics (DA) is a science that combines data mining, machine learning, and statistics. DA examines raw data with the purpose of discovering useful information, suggesting conclusions, and supporting decision-making (source: https://en.wikipedia.org/wiki/Data_analysis). DA has become popular as big data problems have emerged in biological science, engineering, business, and other fields. There are many techniques that have been developed in data analytics. In this short course, we will focus on classification, or supervised learning techniques. These approaches include linear regression (least squares method), Bayes classifier, classification trees, logistic regression and LASSO logistic regression. We will first have a taste of the basic theory behind these techniques, and we will also discuss criteria used to evaluate classification, such as false positive, false negative, precision, and recall. Then we will use both simulated normal mixture data and the email spam data (https://archive.ics.uci.edu/ml/datasets/Spambase) to demonstrate how to use these classification techniques (e.g. Figure 1: LS classifier for the normal mixture data). Note: all the class demonstrations will be carried out in R.
UPLOAD DA-Classificaion-2015-11-17.zip files