DTSA 5509 Introduction to Machine Learning: Supervised Learning
Same as CSCA 5622 & DTSA 5900-11
- Specialization: Machine Learning: Theory and Hands-on Practice with Python Specialization
- Instructor: Daniel Acuna
- Prior knowledge needed: Basic probability and statistics (such as in CSCI3022 – combinatrics, probability distribution function, joint/conditional probability, Bayes rule, normal distribution, p-value, z- and t-statistics); basic programming skills, especially Python programming; basic math knowledge (basic calculus and linear algebra)
Learning Outcomes
- Use modern machine learning tools and python libraries.
- Explain how to deal with linearly-inseparable data.
- Compare logistic regression’s strengths and weaknesses.
- Explain what decision tree is & how it splits nodes.
Course Content
Duration: 6h
Welcome to Introduction to Machine Learning: Supervised Learning. In this first module, you will begin your journey into supervised learning by exploring how machines learn from labeled data to make predictions. You will learn to distinguish between supervised and unsupervised learning, and understand the key differences between regression and classification tasks. You will also gain insight into the broader machine learning workflow, including the roles of predictors, response variables, and the importance of training versus testing data. By the end of this module, you will have a solid foundation in the goals and mechanics of supervised learning.
Duration: 3h
In this module, you will expand your understanding of linear models by incorporating multiple predictors, including categorical variables and interaction terms. You will learn how to interpret partial regression coefficients and assess the fit of your models using metrics like R² and RMSE. As you build more complex models, you will also explore the risks of overfitting and the importance of model validation. By the end of this module, you will be equipped to build and evaluate multiple linear regression models with confidence.
Duration: 4h
In this module, you will transition from predicting continuous outcomes to modeling categorical ones. You will learn how logistic regression models binary outcomes, like whether a customer will default on a loan, using probabilities and odds, and how to interpret the results. You will also explore k-Nearest Neighbors, a flexible, non-parametric method that classifies observations based on their proximity to others in the dataset. To evaluate your models, you will use tools like confusion matrices, accuracy, and precision/recall, gaining insight into how well your classifiers perform. This module lays the groundwork for tackling real-world classification problems with confidence and clarity.
Duration: 4h
In this module, you will learn how to evaluate your models more reliably and improve their generalization to new data. You will explore resampling methods like k-fold cross-validation and the bootstrap, which help estimate test performance without needing a separate test set. You will also be introduced to the regularization techniques Ridge and Lasso that prevent overfitting by constraining model complexity. Using cross-validation, you will learn how to select the optimal regularization strength, balancing predictive accuracy with model simplicity. These tools are essential for building models that perform well not just in theory, but in practice.
Duration: 3h
This module introduces you to one of the most intuitive and interpretable machine learning models: decision trees. You will explore how trees split the feature space into regions, how to read their structure, and why they are prone to overfitting if left unchecked. Trees are just the beginning; this module also introduces ensemble techniques that elevate predictive accuracy by combining many models. You will get a first look at methods like bagging, random forests, and boosting, and see how they compare to the models you have already studied. By the end, you will understand when and why tree-based models can outperform simpler approaches, especially in capturing complex, non-linear relationships.
Duration: 1h
You will complete a final exam worth 20% of your grade. You must attempt the final in order to earn a grade in the course. If you've upgraded to the for-credit version of this course, please make sure you review the additional for-credit materials in the Introductory module and anywhere else they may be found.
Note: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click View on Coursera button above for the most up-to-date information.