CSCA 5512: Data Mining Methods

Get a head start on program admission

 Preview this course in the non-credit experience today! 
Start working toward program admission and requirements right away. Work you complete in the non-credit experience will transfer to the for-credit experience when you upgrade and pay tuition. See How It Works for details.

Cross-listed with DTSA 5505

Course Type: Computer Science Elective

Specialization: Data Mining Foundations and Practice

Instructor: Dr. Qin (Christine) Lv, Associate Professor of Computer Science

Prior knowledge needed:

  • Programming languages: Basic to intermediate experience with Python, Jupyter Notebook
  • Math: Basic experience with Probability and Statistics, Linear Algebra
  • Technical requirements: Windows or Mac, Linux, Jupyter Notebook

  View on Coursera

Learning Outcomes

  • Identify the core functionalities of data modeling in the data mining pipeline.

  • Apply techniques that can be used to accomplish the core functionalities of data modeling and explain how they work.

  • Evaluate data modeling techniques, determine which is most suitable for a particular task, and identify potential improvements.

Course Grading Policy

Assignment

Percentage of Grade

Programming Assignment: Frequent Pattern Analysis

20%

Programming Assignment: Classification

20%

Programming Assignment: Clustering

20%

Peer Review: Peer Review: Outlier Analysis, Research Frontiers

20%

CSCA 5512 Data Mining Methods Final Exam

20%

Course Content

Duration: 8 hours

This week starts with an overview of this course, Data Mining Methods, then focuses on frequent pattern analysis, including the Apriori algorithm and FP-growth algorithm for frequent itemset mining, as well as association rules and correlation analysis. 

Duration: 6 hours

This week introduces supervised learning, classification, prediction, and covers several core classification methods including decision tree induction, Bayesian classification, support vector machines, neural networks, and ensemble methods. It also discusses classification model evaluation and comparison. 

Duration: 6 hours

This week introduces you to unsupervised learning, clustering, and covers several core clustering methods including partitioning, hierarchical, grid-based, density-based, and probabilistic clustering. Advanced topics for high-dimensional clustering, bi-clustering, graph clustering, and constraint-based clustering are also discussed. 

Duration: 5 hours

This week discusses three different types of outliers (global, contextual, and collective) and how different methods may be used to identify and analyze such outliers. It also covers some advanced methods for mining complex data, as well as the research frontiers of the data mining field. 

Duration: 1.75 hours

This module contains materials for the final exam. The exam is proctored using ProctorU.

  • You will need to arrange for a time to take the proctored exam.
  • It is a one-hour exam.
  • You may submit your answers only once.
  • The exam contains only multi-choice questions.
  • There are no programming questions in the exam.
  • You are not allowed to use any notes or access other websites when you take your exam.

Notes

  • Cross-listed Courses: Courses that are offered under two or more programs. Considered equivalent when evaluating progress toward degree requirements. You may not earn credit for more than one version of a cross-listed course.
  • Page Updates: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click the View on Coursera button above for the most up-to-date information.