• Specialization: Statistical Modeling for Data Science Applications
  • Instructor: Brian Zaharatos, Director, Professional Master’s Degree in Applied Mathematics
  • Prior knowledge needed: Basic calculus (differentiation and integration), linear algebra, probability theory

View on Coursera

Learning Outcomes

Successful completion of this course demonstrates your achievement of the following learning outcomes for the MS-DS on Coursera:

  • Acquire, clean, wrangle, and manage data.
  • Correctly perform exploratory data analyses in order to assist with the generation of scientific hypotheses.
  • Apply principles and methods of probability theory and statistics to draw rational conclusions from data.
  • Construct an appropriate statistical model in order to answer important scientific or business-related questions.
  • Assess the validity of a statistical model when applied to a particular dataset.
  • Be sensitive to ethical issues that are involved in dealing with data science applications arising in real world situations.
  • Clearly communicate the results of a data science analysis to a non-technical audience.
  • Use peer feedback, self-reflection and video analysis to improve collaboration skills.
  • Create reproducible statistical workflows.
  • Act ethically in the role of professional data scientist.

Course Content

Duration: 7h 32m

In this module, we will introduce the basic conceptual framework for statistical modeling in general, and linear statistical models in particular.

Duration: 8h 14m

In this module, we will learn how to fit linear regression models with least squares. We will also study the properties of least squares, and describe some goodness of fit metrics for linear regression models.

Duration: 8h 32m

In this module, we will study the uses of linear regression modeling for justifying inferences from samples to populations.

Duration: 5h 52m

In this module, we will identify how models can predict future values, as well as construct confidence intervals for those values. We will also explore the relationship between statistical modelling and causal explanations.

Duration: 7h 12m

In this module, we will learn how to diagnose issues with the fit of a linear regression model. In particular, we will use formal tests and visualizations to decide whether a linear model is appropriate for the data at hand.

Duration: 7h 36m

In this module, we will identify measures to improve our models after they have been fit to the data. In particular, we will learn when and how to apply model selection techniques such as forward selection and backward selection, criterion-based methods and multicollinearity diagnostics.

 

Note: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click View on Coursera button above for the most up-to-date information.