http://pegasus.cc.ucf.edu/~xsu/PIC/rainbow.gif
Statistics 6704 – Spring, 2008
Data Mining Methodology II
http://pegasus.cc.ucf.edu/~xsu/PIC/rainbow.gif 

Department of Statistics
University of Central Florida
 

Instructor:

Dr. Xiaogang Su

Office

Room 102, CC II

Phone:

(407) 823-2940

Email:

xiaosu@mail.ucf.edu

Office Hour

W 3:00-4:00 pm and R 2:00-3:00 pm

Prerequisite

STA 5703

Website

http://pegasus.cc.ucf.edu/~xsu/CLASS/STA6704/

·         Announcement: 

·         Description of the Course:     Data mining is the process of exploration and analysis, by automatic or semiautomatic means, of large quantities of observational data in order to discover meaningful patterns and models to the data owner. By applying data mining techniques, data miners can fully exploit data patterns and behavior, and gain a greater understanding of the inside of the data. The goal of data mining application in business is to produce new knowledge that decision-makers can act upon. It does this by using sophisticated techniques such as logistic regression and decision trees to build a model of the real world based on data collected from a variety of sources including corporate transactions, customer histories and demographics, and from external sources such as credit bureaus. This model produces knowledge that can be used to support decision-making and to predict new business opportunities. This course will cover data mining techniques such as association rules, statistical parametric/nonparametric regression, decision trees & their extensions (bagging, boosting, and RF), GAM, MARS, projection pursuit, neural networks, SVM, and etc.
 

·         Statistical Computing:  
 

o    Both SAS 9.1 with Enterprise Miner 5.2 and R will be used to do the statistical computing involved in this class.. In order to reinstall the newest version of SAS and SAS Enterprise Miner on your laptop computer, please contact the data mining lab, which is located at Room 350, MAP Building. Before you go, please check their time schedule. 

o    R can be downloaded free from from the website of CRAN: http://cran.r-project.org/ . Clink here to Download R-2.6.1-win32.exe or  http://cran.r-project.org/bin/windows/base/R-2.6.1-win32.exe.  
 

·         Syllabus and the Grading Policy:

Range

94+

93-90

89-87

86-83

82-80

79-77

76-73

72-70

69-67

66-63

62-60

59-0

Grade

A

A-

B+

B

B-

C+

C

C-

D+

D

D-

F

Chapter

Notes

Data Sets
SAS Programs
R Programs

1

Introduction - Overview of Data Mining Techniques

 

 

2

Association Rules

BNKSERV.SD2

 

R-2.Rhttp://pegasus.cc.ucf.edu/~xsu/PIC/new2.gif

3

An Introduction to R

 

 

4

Simple Linear Regression Models

 

SAS-4-1.sas

R-4-1.R

5

Multiple Linear Regression Models and GLM (1, 2, 3)

bb87.dat

SAS-5-1.sas

R-5.R

6

Parametric Nonlinear Regression(1,2)

 

SAS-6.sas

R-6.R

7

KNN, Kernel Smoothing and Regression

 

SAS-7.sas

R-7.R

8

Regression / Smoothing Splines

 

SAS-8.sas

R-8.R

9

Generalized Additive Models (GAM)

 

SAS-9.sas

R-9.R

10

Tree-Based Methods

 

 

R-10.R

11

Boosting

 

 

R-11.R

12

Bagging and Ransom Forests

 

 

R-12.R

12-2

Bagging and Boosting in SAS EM

pen.sas7bdat

 

 

13

Multivariate Adaptive Regression Splines (MARS)

 

 

R-13.R

14

Projection Pursuit Regression (PPR)

 

 

R-14.R

15

Neural Networks I - Introduction

 

 

 

16

Neural Networks II (1,2,3,4)

EXPCAR.SD2

 

 

17

Neural Networks III - SAS EM Examples

hmeq.sas7bdat

 

R-17.R http://pegasus.cc.ucf.edu/~xsu/PIC/new2.gif

18

Support Vector Machine (SVM), SVM-Rhttp://pegasus.cc.ucf.edu/~xsu/PIC/new2.gif

 

SAS-18.sas

 

 

·         Homework assignments and Class Handouts:

To ensure the homework assignment completed on time, the following grading policy will be applied. Any late homework assignment submitted within two days after the due day will be counted as 50%.  Submissions in more than two days after the due day assignment for any reason will not be graded and counted as 0 point.

Homework

Assignment

Other Related Files

1

project1.pdf

Marketing: Info   Data

2

project2.pdf

tampalms.dat

3

project3.pdf

bb92.txt

4

project4.pdf

 

5

Data Mining Competition

 

6

proj6.doc http://pegasus.cc.ucf.edu/~xsu/PIC/new2.gif

  training-ffire.csv, test-ffire.csvhttp://pegasus.cc.ucf.edu/~xsu/PIC/new2.gif

Final