Unsupervised Learning

ist718-week9

Posted by renjie on March 10, 2020


Unsupervised Learning

数据集只有 x1, x2, x3, 没有y.
我们获得unlabeled data 要比 labeled data 更容易

Principal Components Analysis

a tool used for data visualization or data pre-processing before supervised techniques are applied

It finds a sequence of linear combinations of the variables that have maximal variance, and are mutually uncorrelated。 降纬度 通过loading的平方和为1,计算出方差最大的z (先对数据进行标准化) 就是将变量投影到特征空间的一个方向上(该方向,数据分布差异化最明显) 多少数量,看elbow拐点

Clustering

a broad class of methods for discovering unknown subgroups in data

  • K-means clustering 在707也有介绍
  • hierarchical clustering

kmeans要求我们制定k的值,而hierarchical clustering不用

conclusion