【COMP337 LEC2】-编程知识

【COMP337 LEC2】

Association Pattern Mining 关联模式挖掘

Special case: Frequent Pattern Mining (binary data sets) 频繁模式挖掘

Given data matrix, identify all subsets of columns ( features ) such that at least a fraction of rows (objects ) in the matrix have all the features enabled (i.e., the features take on the value of 1).

Classification 分类

1. The goal is to use training data to learn relationships between a fixed feature (called class label ) and the remaining features in the data

使用训练数据去学习一个固定特征（被叫做类特征）和数据中其他特征的关系

2. The resulting learned model may then be used to estimate (predict) values of the class label for records, where the value is not known.

根据学习后得到的模型来预测 records的类特征的值

3. The objects whose class label is unknown are test objects (test data).

类特征未知的对象叫测试对象

4. 监督学习

Examples

1. Targeted marketing

2. Text recognition

Clustering 聚类

1. Given a data set (data matrix), partition its objects (rows) into sets (clusters) C 1 , C 2 , …, C k such that the objects in each cluster are “ similar ” to one another.

2. Specific definitions depend on how the notion of similarity is defined

3. Can be seen as an unsupervised version of classification. 未监督学习版本的分类

Examples

1. Customer segmentation (identify similar customers for targeted product promotion)

2. Data summarisation (cluster can be used to create a summary of the data)