Class Information

Instructor: Hao “Harry” Feng. Email: hxf155 at case dot edu

Teach Assistant: Chloe Jen (Email: cxj191 at case dot edu)

Class/Lab: Tuesday and Thursday 2:30PM to 3:45PM

Overview

Vast amount of data are being collected in medical and social research and in many industries. Such big data generate a demand for efficient and practical tools to analyze the data and to identify unknown patterns. We will cover a variety of statistical machine learning techniques (supervised learning) and data mining techniques (unsupervised learning), with data examples from biomedical and social research. Specifically, we will cover prediction model building and model selection (shrinkage, Lasso), classification (logistic regression, discriminant analysis, k-nearest neighbors), tree-based methods (bagging, random forests, boosting), support vector machines, association rules, clustering and hierarchical clustering. Basic techniques that are applicable to many of the areas, such as cross-validation, the bootstrap, dimensionality reduction, and splines, will be explained and used repeatedly. The field is fast evolving and new topics and techniques may be included when necessary.

Students should refer to Canvas for lecture notes, zoom recordings, etc.