For fall 2018, Data Mining is offered in three sections:

  • CSCI 4502-001: undergraduate
  • CSCI 5502-001: graduate on-campus
  • CSCI 5502-001B: graduate distance learning

Course Overview

This course Introduces basic data mining concepts and techniques for discovering interesting patterns hidden in large-scale data sets, focusing on issues relating to effectiveness and ffciency. Topics covered include data preprocessing, data warehouse, association, classication, clustering, outlier detection, and mining specific data types such as time-series, social networks, multimedia, and Web data.

Prerequisites

  • CSCI 4502-001: CSCI 2270 with a minimum grade of C-
  • CSCI 5502-001 and CSCI 5502-001B: graduate status

Textbook

Data Mining: Concepts and Techniques by Jiawei Han, Micheline Kamber, and Jian Pei. 3rd edition, Morgan Kaufmann, 2011. ISBN-13: 978-0123814791

Grading

  • Problem sets (35%): Work alone; extra questions for students taking CSCI 5502
  • Midterm exam (25%): Closed book; in class exam; extra questions for students taking CSCI 5502
  • Course project (40%): Work in groups; a self-dened project that applies data mining to real-world problems; recommended group size is 3-4; group can be a mixture of 4502/5502, on-campus/distance learning students; higher threshold for students taking CSCI 5502; project scope needs to be appropriate for group size and group composition (CSCI 4502 vs. CSCI 5502)
  • Late policy: At most 2-day delay in submission with 20-point penalty each day

Tentative Class Schedule

  • Week 1: Introduction
  • Week 2: Data Preprocessing
  • Week 3: Data Warehouse
  • Week 4: Frequent Patterns
  • Week 5: Classification
  • Week 6: Project Proposal
  • Week 7: Classification
  • Week 8: Clustering
  • Week 9: Clustering
  • Week 10:  Midterm Exam
  • Week 11: Outlier Detection
  • Week 12: Project Checkpoint
  • Week 13: Fall Break, Thanksgiving
  • Week 14: Data Streams, Time-Series
  • Week 15: Graphs, Social Networks, Web Data
  • Week 16: Project Final Report