• Specialization: Statistical Modeling for Data Science Applications
  • Instructor: Brian Zaharatos, Director, Professional Master’s Degree in Applied Mathematics
  • Prior knowledge needed: Basic calculus (differentiation and integration), linear algebra, probability theory

View on Coursera

Learning Outcomes

Successful completion of this course demonstrates your achievement of the following learning outcomes for the MS-DS on Coursera:

  • Acquire, clean, wrangle, and manage data.
  • Correctly perform exploratory data analyses in order to assist with the generation of scientific hypotheses.
  • Apply principles and methods of probability theory and statistics to draw rational conclusions from data.
  • Construct an appropriate statistical model in order to answer important scientific or business-related questions.
  • Assess the validity of a statistical model when applied to a particular dataset.
  • Be sensitive to ethical issues that are involved in dealing with data science applications arising in real world situations.
  • Clearly communicate the results of a data science analysis to a non-technical audience.
  • Use peer feedback, self-reflection and video analysis to improve collaboration skills.
  • Create reproducible statistical workflows.
  • Act ethically in the role of professional data scientist.

Course Content

Duration: 8h

In this module, we will introduce the basic conceptual framework for statistical modeling in general, and linear statistical models in particular.

Duration: 8h

In this module, we will learn how to fit linear regression models with least squares. We will also study the properties of least squares, and describe some goodness of fit metrics for linear regression models.

Duration: 9h

In this module, we will study the uses of linear regression modeling for justifying inferences from samples to populations.

Duration: 6h

In this module, we will identify how models can predict future values, as well as construct confidence intervals for those values. We will also explore the relationship between statistical modelling and causal explanations.

Duration: 7h

In this module, we will learn how to diagnose issues with the fit of a linear regression model. In particular, we will use formal tests and visualizations to decide whether a linear model is appropriate for the data at hand.

Duration: 8h

In this module, we will study methods for model selection and model improvement.. In particular, we will learn when and how to apply model selection techniques such as forward selection and backward selection, criterion-based methods, and will learn about the problem of multicollinearity (also called collinearity).

Duration: 5h

You will complete a proctored exam worth 20% of your grade made up of multiple choice questions. You must attempt the final in order to earn a grade in the course. If you've upgraded to the for-credit version of this course, please make sure you review the additional for-credit materials in the Introductory module and anywhere else they may be found.

Note: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click View on Coursera button above for the most up-to-date information.