image

About this course

This course is an introductory course on data mining. It introduces the basic concepts, principles, methods, implementation techniques, and applications of data mining. The technical contents of the course are based on the text book ‘Data Mining: Concepts and Techniques (3rd edition)’. In this course, main themes covered in the textbook will be covered, including (1) data preprocessing and preparation, (2) frequent patterns and association rule mining, (3) classification, and (4) classification.

 


 

Learning objectives

  • You will learn various techniques for data preprocessing, including data cleaning, data integration, data reduction, data transformation and data discretization.
  • You will learn why pattern discovery is important, what the major tricks are for efficient pattern mining, and how to apply pattern discovery in some interesting applications.
  • You will learn concepts and methods for classification, including decision tree, naïve Bayesian, rule based methods, and lazy methods.
  • You will learn the basic concepts of cluster analysis and a set of typical clustering methods, including partitioning methods, hierarchical methods, density-based methods, and grid-based methods.

 


Course Structure
  • Introduction to Course
  • Part 1: Know Data (3 Sessions)
    • Preliminaries of data mining
    • Statistical descriptions of data
    • Measuring data similarity
    • Visualization
  • Part 2: Preprocessing (6 sessions)
    • Data quality and data cleaning
    • Data preprocessing
    • Data preprocessing in practice
  • Part 3: Frequent patterns and association rule mining (4 sessions)Part 4: Classification (6 sessions)
    • Appriori
    • FPgrowth
    • ECLAT
    • Frequent pattern mining in practice
    • Decision tree
    • Naive Bayesian
    • Rule-based methods
    • Lazy methods
    • Evaluation metrics
    • Classification methods in practice
  • Part 5: Clustering (3 sessions)
    • Clustering methods
    • Cluster analysis and evaluation
    • Clustering in practice
  • Discuss on a real-world scenario (1 session)

Materials

Link-B


Resources

FDM-BK_Books

 

 

 


 Grading
  • Group Assignments (3 points in total)
    • 3-4 hours per homework
    • ≈ 15 hours in total
  • Projects (7 points)
    • ≈ 50 hours
  • Online quizzes (7 points in total)
  • Final exam (3 points)
  • Active participation in the class (up to 2 extra points)