CSCA 5502: Data Mining Pipeline

Get a head start on program admission

 Preview this course in the non-credit experience today! 
Start working toward program admission and requirements right away. Work you complete in the non-credit experience will transfer to the for-credit experience when you upgrade and pay tuition. See How It Works for details.

Cross-listed with DTSA 5504

Course Type: Computer Science Elective

Specialization: Data Mining Foundations and Practice

Instructor: Dr. Qin (Christine) Lv, Associate Professor of Computer Science

Prior knowledge needed:

  • Programming languages: Basic to intermediate experience with Python, Jupyter Notebook
  • Math: Basic experience with Probability and Statistics, Linear Algebra
  • Technical requirements: Windows or Mac, Linux, Jupyter Notebook

  View on Coursera

Learning Outcomes

  • Identify the key components of the data mining pipeline and describe how they're related

  • Apply techniques to address challenges in each component of the data mining pipeline.

  • Identify particular challenges presented by each component of the data mining pipeline.

Course Grading Policy

Assignment

Percentage of Grade

Peer Review: Data Mining Example

10%

Peer Review: Data Mining Issues

10%

Programming Assignment: Data Understanding

20%

Programming Assignment: Data Preprocessing

20%

Programming Assignment: Data Warehousing

20%

CSCA 5502 Data Mining Pipeline Final Exam

20%

Course Content

Duration: 7 hours

This week provides you with an introduction to the Data Mining Specialization and this course, Data Mining Pipeline. As you begin, you will get introduced to the four views of data mining and the key components in the data mining pipeline. 

Duration: 5.5 hours

This week covers data understanding by identifying key data properties and applying techniques to characterize different datasets. 

Duration: 5.25 hours

This week explains why data preprocessing is needed and what techniques can be used to preprocess data.  

Duration: 5 hours

This week covers the key characteristics of data warehousing and the techniques to support data warehousing. 

Duration: 1.75 hours

This module contains materials for the final exam. This exam is a proctored exam administered through ProctorU.

  • You will need to arrange for a time to take the proctored exam.
  • It is a one-hour exam.
  • You may submit your answers only once.
  • The exam contains only multi-choice questions.
  • There are no programming questions in the exam.
  • You are not allowed to use any notes or access other websites when you take your exam.
  • The exam tests conceptual understanding of the course materials. There is no need to memorize formulas.

Notes

  • Cross-listed Courses: Courses that are offered under two or more programs. Considered equivalent when evaluating progress toward degree requirements. You may not earn credit for more than one version of a cross-listed course.
  • Page Updates: This page is periodically updated. Course information on the Coursera platform supersedes the information on this page. Click the View on Coursera button above for the most up-to-date information.