Course Topics

Structured Query Language (SQL) is the prevailing language to communicate with datasets today. In the era of big data, SQL assumes a vital role in many fields. However, SQL by itself provides very limited statistical functionality. Fortunately, many popular statistical packages have integrated SQL as a built-in part or add-on package.

The topics will be covered in this short course include basic SQL concept, querying commands from a single dataset (select, where), aggregate function (group by, having, sum, avg, max, min), querying commands from multiple dataset (join). Attendees can practice these commands with an online SQL learning source, sqlzoo.net. We will also introduce package sqldf in R and demonstrate its effectiveness with the individual household electric power consumption dataset from UCI data repository, http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption.

This short course is an introductory course for researcher who wants to learn the basic SQL commands. Prior experience with SQL or programming is NOT required. Experience with R is preferred but NOT required. More LISA short courses about R can be found here.