featurebox.featurizers.state package¶

Submodules¶

featurebox.featurizers.state.extrastats module¶

General methods for computing property statistics from a list of values

class featurebox.featurizers.state.extrastats.PropertyStats¶

基类：object

This class contains statistical operations that are commonly employed when computing features. The primary way for interacting with this class is to call the calc_stat function, which takes the x_name of the statistic you would like to compute and the weights/values of datamnist to be assessed. For example, computing the mean of a list looks like:

>>> x = [1, 2, 3]
>>> PropertyStats.calc_stat(x, 'mean') # Result is 2
>>> PropertyStats.calc_stat(x, 'mean', weights=[0, 0, 1]) # Result is 3

Some of the statistics functions take options (e.g., Holder means). You can pass them to the the statistics functions by adding them after the x_name and two colons. For example, the 0th Holder mean would be:

>>>PropertyStats.calc_stat(x, ‘holder_mean::0’)

You can, of course, call the statistical functions directly. All take at least two arguments. The first is the datamnist being assessed and the second, optional, argument is the weights.

static avg_dev(data_lst, weights=None)¶

Mean absolute deviation of list of element datamnist. This is computed by first calculating the mean of the list, and then computing the average absolute difference between each value and the mean. :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats

返回:: mean absolute deviation

static calc_stat(data_lst, stat, weights=None)¶

Compute a property statistic

参数:

data_lst (list of floats) – list of values
stat (str) –
example (should be added after the x_name and separated by two colons. For) –
would (the 2nd Holder mean) –
"holder_mean::2" (be) –
weights (list of floats) – (Optional) weights for each element in data_lst

返回:

float - Desired statistic

static eigenvalues(data_lst, symm=False, sort=False)¶

Return the eigenvalues of a matrix as a numpy array :param data_lst: (matrix-like) of values :param symm: whether to assume the matrix is symmetric :param sort: wheter to sort the eigenvalues

Returns: eigenvalues

static flatten(data_lst, weights=None)¶: Returns a flattened copy of data_lst-as a numpy array

static geom_std_dev(data_lst, weights=None)¶

Geometric standard deviation :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats

返回:: geometric standard deviation

static holder_mean(data_lst, weights=None, power=1)¶

Get Holder mean :param data_lst: (list/array) of values :param weights: (list/array) of weights :param power: (int/float/str) which holder mean to compute

Returns: Holder mean

static inverse_mean(data_lst, weights=None)¶

Mean of the inverse of each entry :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats

返回:: inverse mean

static kurtosis(data_lst, weights=None)¶

Kurtosis of a list of datamnist :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats

返回:: kurtosis

static maximum(data_lst, weights=None)¶

Maximum value in a list :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: (ignored)

返回:: maximum value

static mean(data_lst, weights=None)¶

Arithmetic mean of list :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats

返回:: mean value

static minimum(data_lst, weights=None)¶

Minimum value in a list :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: (ignored)

返回:: minimum value

static mode(data_lst, weights=None)¶

Mode of a list of datamnist. If multiple elements occur equally-frequently (or same weight, if weights are provided), this function will return the minimum of those values. :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats

返回:: mode

static quantile(data_lst, weights=None, q=0.5)¶

Return a specific quantile. :param data_lst: 1D datamnist list to be used for computing, quantiles :type data_lst: list or np.ndarray :param q: The quantile, as a fraction between 0 and 1. :type q: float

返回:: (float) The computed quantile of the data_lst.

static range(data_lst, weights=None)¶

Range of a list :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: (ignored)

返回:: range

static skewness(data_lst, weights=None)¶

Skewness of a list of datamnist :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats

返回:: shewness

static sorted(data_lst, weights=None)¶: Returns the sorted data_lst

static std_dev(data_lst, weights=None)¶

Standard deviation of a list of element datamnist :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats

返回:: standard deviation

featurebox.featurizers.state.state_mapper module¶

class featurebox.featurizers.state.state_mapper.StructurePymatgenPropMap(prop_name=None, func: Optional[Callable] = None, return_type='df', **kwargs)¶

基类：_StructurePymatgenPropMap

Get property of pymatgen structure preprocessing. default [“density”, “volume”, “ntypesp”]

示例

>>> tmps = StructurePymatgenPropMap()
>>> tmps.fit_transform()

参数:

prop_name – (str,list of str) prop name or list of prop name default [“density”, “volume”, “ntypesp”]
func – (callable or list of callable) please make sure the size of it is the same with prop_name.

featurebox.featurizers.state.statistics module¶

class featurebox.featurizers.state.statistics.BaseCompositionFeature(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BinaryMap

BaseCompositionFeature is the basis for composition data. the subclass should be re-implemented, such as:

def mix_function(self, elems:List, nums:List):
    w_ = np.array(nums)
    return w_.dot(elems)

Base class for composition feature.

convert_dict(atoms: dict) → ndarray¶: Convert atom {symbol: fraction} list to numeric features

convert_number(atoms: List) → ndarray¶: Convert atom {symbol: fraction} list to numeric features

abstract mix_function(elems: List, nums: Union[List, ndarray])¶

参数:

elems (list) – Elements in compound.
nums (list) – Number of each element.

返回:

descriptor

返回类型:

numpy.ndarray

class featurebox.featurizers.state.statistics.DepartElementFeature(data_map: BinaryMap, n_composition: int, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BaseCompositionFeature

Get the table of element data.

示例

>>> from featurebox.featurizers.atom.mapper import AtomJsonMap
>>> from featurebox.featurizers.state.union import UnionFeature
>>> from featurebox.featurizers.state.statistics import DepartElementFeature
>>> data_map = AtomJsonMap(search_tp="name",embedding_dict="ele_megnet.json", n_jobs=1) # keep this n_jobs=1 and return_type="np"
>>> wa = DepartElementFeature(data_map,n_composition=2, n_jobs=1, return_type="pd")
>>> comp = [{"H": 2, "Pd": 1},{"He":1, "Al":4}]
>>> wa.set_feature_labels(["fea_{}".format(_) for _ in range(16)]) # 16 this the feature number of built-in "ele_megnet.json"
>>> couple_data = wa.fit_transform(comp)
    fea_0_0   fea_0_1   fea_1_0  ...  fea_14_1  fea_15_0  fea_15_1
0  0.352363  0.561478  0.635952  ... -0.236541 -0.270104 -0.212607
1 -0.067220  0.025758  0.141113  ... -0.092577 -0.042185  0.080350

[2 rows x 32 columns]

Base class for composition feature.

convert_dict(atoms: Union[dict, Composition]) → ndarray¶: Convert atom {symbol: fraction} list to numeric features

convert_number(atoms: List) → ndarray¶: Convert atom {symbol: fraction} list to numeric features

mix_function(elems: ndarray, nums=None)¶

参数:

elems (list) – Elements in compound.
nums (list) – Number of each element.

返回:

descriptor

返回类型:

numpy.ndarray

set_feature_labels(values)¶

Generate attribute names.

返回:: ([str]) attribute labels.

class featurebox.featurizers.state.statistics.ExtraMix(data_map: BinaryMap, stats: Tuple[str] = ('mean',), n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BaseCompositionFeature

参见

WeightedSum

Base class for composition feature.

mix_function(elems, nums)¶

参数:

elems (list) – Elements in compound.
nums (list) – Number of each element.

返回:

descriptor

返回类型:

numpy.ndarray

class featurebox.featurizers.state.statistics.GeometricMean(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BaseCompositionFeature

参见

WeightedSum

Base class for composition feature.

mix_function(elems: ndarray, nums)¶

参数:

elems (list) – Elements in compound.
nums (list) – Number of each element.

返回:

descriptor

返回类型:

numpy.ndarray

class featurebox.featurizers.state.statistics.HarmonicMean(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BaseCompositionFeature

参见

WeightedSum

Base class for composition feature.

mix_function(elems, nums)¶

参数:

elems (list) – Elements in compound.
nums (list) – Number of each element.

返回:

descriptor

返回类型:

numpy.ndarray

class featurebox.featurizers.state.statistics.MaxPooling(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BaseCompositionFeature

参见

WeightedSum

Base class for composition feature.

mix_function(elems, _)¶

参数:

elems (list) – Elements in compound.
nums (list) – Number of each element.

返回:

descriptor

返回类型:

numpy.ndarray

class featurebox.featurizers.state.statistics.MinPooling(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BaseCompositionFeature

参见

WeightedSum

Base class for composition feature.

mix_function(elems, _)¶

参数:

elems (list) – Elements in compound.
nums (list) – Number of each element.

返回:

descriptor

返回类型:

numpy.ndarray

class featurebox.featurizers.state.statistics.WeightedAverage(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BaseCompositionFeature

示例

>>> from featurebox.featurizers.atom import AtomTableMap, AtomJsonMap
>>> data_map = AtomJsonMap(search_tp="name", n_jobs=1)
>>> wa = WeightedAverage(data_map, n_jobs=1,return_type="df")
>>> x3 = [{"H": 2, "Pd": 1},{"He":1,"Al":4}]
>>> wa.fit_transform(x3)
         0         1         2   ...        13        14        15
0  0.422068  0.360958  0.201433  ... -0.459164 -0.064783 -0.250939
1  0.007163 -0.471498 -0.072860  ...  0.206306 -0.041006  0.055843

[2 rows x 16 columns]

>>> wa.set_feature_labels(["fea_{}".format(_) for _ in range(16)])
>>> wa.fit_transform(x3)
      fea_0     fea_1     fea_2  ...    fea_13    fea_14    fea_15
0  0.422068  0.360958  0.201433  ... -0.459164 -0.064783 -0.250939
1  0.007163 -0.471498 -0.072860  ...  0.206306 -0.041006  0.055843

[2 rows x 16 columns]

Base class for composition feature.

mix_function(elems, nums)¶

参数:

elems (list) – Elements in compound.
nums (list) – Number of each element.

返回:

descriptor

返回类型:

numpy.ndarray

class featurebox.featurizers.state.statistics.WeightedSum(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BaseCompositionFeature

示例

>>> from featurebox.featurizers.atom import AtomTableMap, AtomJsonMap
>>> data_map = AtomJsonMap(search_tp="name", n_jobs=1)
>>> wa = WeightedSum(data_map, n_jobs=1,return_type="df")
>>> x3 = [{"H": 2, "Pd": 1},{"He":1,"Al":4}]
>>> wa.fit_transform(x3)
         0         1         2   ...        13        14        15
0  1.266204  1.082873  0.604300  ... -1.377492 -0.194350 -0.752816
1  0.035813 -2.357490 -0.364302  ...  1.031530 -0.205029  0.279215

[2 rows x 16 columns]

>>> wa.set_feature_labels(["fea_{}".format(_) for _ in range(16)])
>>> wa.fit_transform(x3)
      fea_0     fea_1     fea_2  ...    fea_13    fea_14    fea_15
0  1.266204  1.082873  0.604300  ... -1.377492 -0.194350 -0.752816
1  0.035813 -2.357490 -0.364302  ...  1.031530 -0.205029  0.279215

[2 rows x 16 columns]

Base class for composition feature.

mix_function(elems, nums)¶

参数:

elems (list) – Elements in compound.
nums (list) – Number of each element.

返回:

descriptor

返回类型:

numpy.ndarray

class featurebox.featurizers.state.statistics.WeightedVariance(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BaseCompositionFeature

参见

WeightedSum

Base class for composition feature.

mix_function(elems: ndarray, nums)¶

参数:

elems (list) – Elements in compound.
nums (list) – Number of each element.

返回:

descriptor

返回类型:

numpy.ndarray

featurebox.featurizers.state.union module¶

class featurebox.featurizers.state.union.PolyFeature(*, degree: Union[int, List[int]] = 3, n_jobs=1, on_errors='raise', return_type='df')¶

基类：BaseFeature, ABC

Extension method.

Such as degree = 2 means (x1x2,x1**2,x2**2)

示例

>>> n = np.array([[0,1,2,3,4,5],[0.422068,0.360958,0.201433,-0.459164,-0.064783,-0.250939]]).T
>>> ps = pd.DataFrame(n,columns=["f1","f2"],index= ["x0","x1","x2","x3","x4","x5"])
>>> pf = PolyFeature(degree=[1,2])
>>> pf.fit_transform(n)

n f0^1 f1^1 f0^2 f0^1*f1^1 f1^2 0 0.0 0.422068 0.0 0.000000 0.178141 1 1.0 0.360958 1.0 0.360958 0.130291 2 2.0 0.201433 4.0 0.402866 0.040575 3 3.0 -0.459164 9.0 -1.377492 0.210832 4 4.0 -0.064783 16.0 -0.259132 0.004197 5 5.0 -0.250939 25.0 -1.254695 0.062970

参数:

batch_size (int) – size of batch.
batch_calculate (bool) – batch_calculate or not.
n_jobs (int) – Parallel number.
on_errors (str) – How to handle the exceptions in a feature calculations. Can be nan, keep, raise. When ‘nan’, return a column with np.nan. The length of column corresponding to the number of feature labs. The default is ‘raise’ which will raise up the exception.
return_type (str) – Specific the return type. Can be any, np,``array`` and df. ‘array’ and ‘df’ force return type to np.ndarray and pd.DataFrame respectively. If ‘any’, without type conversion . Default is ‘any’

fit_transform(X: Union[ndarray, DataFrame], y=None, **kwargs)¶

If convert takes multiple inputs, supply inputs as a list of tuples.

Copy from Mixin class for all transformers in scikit-learn. TransformerMixin

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

参数:

X (list) – list of case.
y (None) – deprecated.
**kwargs – Additional fit or transform parameters. feature_labels_mark: str, mark for each feature_labes. for return_type ==’pd’. x_labels: list, mark for each row. for return_type ==’pd’.

返回:

result data.

返回类型:

X_new

set_feature_labels(input_features=None)¶

Generate attribute names.

返回:: ([str]) attribute labels.

class featurebox.featurizers.state.union.UnionFeature(comp: List[Dict], couple_data: Union[DataFrame, ndarray], couple=2, stats=('mean',), n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶

基类：BaseFeature

Transform method should input0 comp_index rather than entries.

示例

>>> from featurebox.featurizers.atom.mapper import AtomTableMap, AtomJsonMap
>>> data_map = AtomJsonMap(search_tp="name", n_jobs=1)
>>> wa = DepartElementFeature(data_map,n_composition=2, n_jobs=1,return_type="df")
>>> x3 = [{"H": 2, "Pd": 1},{"He":1,"Al":4}]
>>> wa.set_feature_labels(["fea_{}".format(_) for _ in range(16)])
>>> wa.fit_transform(x3)
    fea_0_0   fea_0_1   fea_1_0  ...  fea_14_1  fea_15_0  fea_15_1
0  0.352363  0.561478  0.635952  ... -0.236541 -0.270104 -0.212607
1 -0.067220  0.025758  0.141113  ... -0.092577 -0.042185  0.080350

[2 rows x 32 columns]

>>> couple_data = wa.fit_transform(x3)
>>> uf = UnionFeature(x3,couple_data,couple=2,stats=("mean","maximum"))
>>> uf.fit_transform()
    feamean  feamaximum   feamean  ...  feamaximum   feamean  feamaximum
0  0.422068    0.360958  0.201433  ...   -0.113506  0.021095   -0.212607
1  0.007163   -0.471498 -0.072860  ...    0.312183  0.165278    0.080350

[2 rows x 32 columns]

>>> couple_data = wa.fit_transform(x3)
>>> uf = UnionFeature(x3,couple_data,couple=2,stats=("std_dev",))
>>> uf.fit_transform()
   feastd_dev  feastd_dev  feastd_dev  ...  feastd_dev  feastd_dev  feastd_dev
0    0.147867    0.583352    0.033739  ...    0.366625    0.182177    0.040657
1    0.065745    0.541477    0.209795  ...    0.374331    0.182331    0.086646

[2 rows x 16 columns]

参数:

batch_size (int) – size of batch.
batch_calculate (bool) – batch_calculate or not.
n_jobs (int) – Parallel number.
on_errors (str) – How to handle the exceptions in a feature calculations. Can be nan, keep, raise. When ‘nan’, return a column with np.nan. The length of column corresponding to the number of feature labs. The default is ‘raise’ which will raise up the exception.
return_type (str) – Specific the return type. Can be any, np,``array`` and df. ‘array’ and ‘df’ force return type to np.ndarray and pd.DataFrame respectively. If ‘any’, without type conversion . Default is ‘any’

convert(comp_number=0)¶

Get elemental property attributes

参数:: comp – Pymatgen composition object
返回:: Specified property statistics of features :param comp_number:
返回类型:: all_attributes

fit_transform(entries: Optional[List] = None) → Any¶

If convert takes multiple inputs, supply inputs as a list of tuples.

Copy from Mixin class for all transformers in scikit-learn. TransformerMixin

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

参数:

X (list) – list of case.
y (None) – deprecated.
**kwargs – Additional fit or transform parameters. feature_labels_mark: str, mark for each feature_labes. for return_type ==’pd’. x_labels: list, mark for each row. for return_type ==’pd’.

返回:

result data.

返回类型:

X_new

set_feature_labels(self_elem_data_columns_values: List)¶

Generate attribute names.

参数:: self_elem_data_columns_values (List) – name
返回类型:: ([str]) attribute labels.

transform(entries: Optional[List] = None) → Any¶

Transform a list of entries. Each iterable element of entries is corresponding to the parameter of convert, If convert takes n multiple inputs, the transform inputs should be a list or tuple (size n),

[(p1,p2),(p1,p2),(p1,p2),…,(p1,p2),(p1,p2)]

which can be from zip` or used the built-in transform_with_zip.

参数:: entries (list) – A list of entries to be featured.
返回:: result – features for each entry.
返回类型:: any