• Specialization: Data Mining Foundations and Practice
  • Instructor: Dr. Qin (Christine) Lv, Associate Professor of Computer Science
  • Prior knowledge needed: Familiarity of functionalities in Python, basic idea on data structures and algorithms and concepts of probability

View on Coursera

Learning Outcomes

Successful completion of this course demonstrate your achievement of the following learning outcomes for the MS-DS program:

  • Correctly perform exploratory data analyses in order to assist with the generation of scientific hypotheses.

  • Understand the principles of efficient algorithms for dealing with large scale data sets and be able to select appropriate algorithms for specific problems.

  • Understand and be able to apply the main computational techniques used to analyze large data sets, including a variety of data mining and machine learning approaches.

  • Correctly apply the data science skills above to a specific domain area (e.g., business, climate science).

  • Clearly communicate the results of a data science analysis to a non-technical audience.

Course Content

Duration 5h 44m

This module starts with an overview of data mining methods, then focuses on frequent pattern analysis, including the Apriori algorithm and FP-growth algorithm for frequent itemset mining, as well as association rules and correlation analysis.

Duration 5h 46m 

This module introduces supervised learning, classification, prediction, and covers several core classification methods including decision tree induction, Bayesian classification, support vector machines, neural networks, and ensemble methods. It also discusses classification model evaluation and comparison.

Duration 5h 44m

This module introduces unsupervised learning, clustering, and covers several core clustering methods including partitioning, hierarchical, grid-based, density-based, and probabilistic clustering. Advanced topics for high-dimensional clustering, bi-clustering, graph clustering, and constraint-based clustering are also discussed.

Duration 5h 48m

This module discusses three different types of outliers (global, contextual, and collective) and how different methods may be used to identify and analyze such outliers. It also covers some advanced methods for mining complex data, as well as the research frontiers of the data mining field.

 

Note: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click View on Coursera button above for the most up-to-date information.