• Specialization: Statistical Modeling for Data Science Applications
  • Instructor: Brian Zaharatos, Director, Professional Master’s Degree in Applied Mathematics
  • Prior knowledge needed: Differentiation, integration, linear algebra, probability theory, statistical estimation, hypothesis testing, Statistical Inference for Data Applications specialization

View on Coursera

Learning Outcomes

Successful completion of this course demonstrates your achievement of the following learning outcomes for the MS-DS on Coursera:

  • Acquire, clean, wrangle, and manage data.
  • Correctly perform exploratory data analyses in order to assist with the generation of scientific hypotheses.
  • Apply principles and methods of probability theory and statistics to draw rational conclusions from data.
  • Construct an appropriate statistical model in order to answer important scientific or business-related questions.
  • Assess the validity of a statistical model when applied to a particular dataset.
  • Be sensitive to ethical issues that are involved in dealing with data science applications arising in real world situations.
  • Clearly communicate the results of a data science analysis to a non-technical audience.
  • Use peer feedback, self-reflection and video analysis to improve collaboration skills.
  • Create reproducible statistical workflows.
  • Act ethically in the role of professional data scientist.

Course Content

Duration: 12h

In this module, we will introduce generalized linear models (GLMs) through the study of binomial data. In particular, we will motivate the need for GLMs; introduce the binomial regression model, including the most common binomial link functions; correctly interpret the binomial regression model; and consider various methods for assessing the fit and predictive power of the binomial regression model.

Dureation: 9h

In this module, we will consider how to model count data. When the response variable is a count of some phenomenon, and when that count is thought to depend on a set of predictors, we can use Poisson regression as a model. We will describe the Poisson regression in some detail and use Poisson regression on real data. Then, we will describe situations in which Poisson regression is not appropriate, and briefly present solutions to those situations.

Duration: 9h

In this module, we will introduce the concept of a nonparametric regression model. We will contrast this notion with the parametric models that we have studied so far. Then, we’ll study particular nonparametric regression models: kernel estimators and splines. Finally, we will introduce additive models as a blending of parametric and nonparametric methods..

Duration: 12h

Some models, such as linear regression, are easily interpretable, but inflexible, in that they don't capture many real-world relationships accurately. Other models, such as neural networks, are quite flexible, but very difficult to interpret. Generalized additive models (GAMs) are a nice balance between flexibility and interpretability. In this module, we will further motivate GAMs, learn the basic mathematics of fitting GAMs, and implementing them on simulated and real data in R.

Duration: 4h

You will complete a proctored exam worth 20% of your grade made up of multiple choice questions. You must attempt the final in order to earn a grade in the course. If you've upgraded to the for-credit version of this course, please make sure you review the additional for-credit materials in the Introductory module and anywhere else they may be found.

Note: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click View on Coursera button above for the most up-to-date information.