Description
An introductory course connecting students to the most recent developments in the fields of data science and machine learning. The goal of this course is to present in detail the fundamental mathematical ideas behind the data science concepts. In our vision this journey leads naturally to the foundations of machine learning. Students will learn about Exploratory Data Analysis, training, testing and validation, supervised and unsupervised learning, classification and clustering, regression analysis.
Prerequisites
Minimum grade of C- in MATH241 or MATH340; and minimum grade of C- in MATH240, MATH461 or MATH341; and minimum grade of C- in STAT400 or STAT410
Level of Rigor
Standard
Sample Textbooks
Instructor notes
Applications
Data Science, Machine Learning, Economics, Bioinformatics
If you like this course, you might also consider the following courses
MATH416, MATH420, MATH464, STAT430, STAT440
Additional Notes
Students interested in grad school in STAT should consider this course
Topics
Preprocessing and Data Munging/Wrangling. Normalization. Outliers. Summary Statistics. Regression analysis. Linear models. Least squares. Training, test, and validation datasets. Underfitting. Overfitting. Interpolation and Extrapolation of Data. Support Vector Machines. Linear SVM. Hard and Soft Margin. Kernel trick. Mercer's theorem. Nonlinear SVM. Supervised Learning. Classification. Binary and multiclass classification. Linear classifiers. Nearest neighbors. Unsupervised Learning. Clustering. Centroid-based clustering. K-means. Hierarchical clustering. Decision Trees. Regularization. Data Compression. Dimensionality Reduction. Principal Components. Feature extraction.