相关系数筛选¶

1. 自动进行

>>> from sklearn.datasets import fetch_california_housing
>>> from featurebox.selection.corr import Corr
>>> x, y = fetch_california_housing(return_X_y=True)
>>> co = Corr(threshold=0.7,multi_index=[0,8],multi_grade=2)
>>> newx = co.fit_transform(x)
>>> print(x.shape)
>>> print(newx.shape)
>>> #(506, 13)
>>> #(506, 9)

1. 分步进行

>>> from sklearn.datasets import fetch_california_housing
>>> from featurebox.selection.corr import Corr
>>> x, y = fetch_california_housing(return_X_y=True)
>>> co = Corr(threshold=0.7,multi_index=[0,8],multi_grade=2)

特征数8, 特征被为大小为2的组: [[0,1],[2,3],[4,5],[6,7]] 对应于初始的13个特性. [0,1] -> 0; [2,3] -> 1; [4,5]->2; [6,7]->3; 8->4; 9->5; 10->6; 11->7; 12->8; 13->9;

>>> co.fit(x)
>>> Corr(multi_index=[0, 8], threshold=0.7)
>>> group = co.count_cof()
>>> group[1]
>>> #[[0], [1], [2], [3], [4, 5], [4, 5], [6], [7], [8]]

在此步骤中，您可以手动选择，或自动过滤。如下

>>> co.remove_coef(group[1]) # Filter automatically by machine.
>>> #[0, 1, 2, 3, 4, 6, 7, 8]

其中2被过滤，对应于初始的14特征. [0,1] -> 0; [2,3] -> 1; [4,5]->2; [6,7]->3; 8->4; [9->5]; 10->6; 11->7; 12->8; 13->9;