featurebox.featurizers.state package¶
Submodules¶
featurebox.featurizers.state.extrastats module¶
General methods for computing property statistics from a list of values
- class featurebox.featurizers.state.extrastats.PropertyStats¶
Bases:
objectThis class contains statistical operations that are commonly employed when computing features. The primary way for interacting with this class is to call the
calc_statfunction, which takes the x_name of the statistic you would like to compute and the weights/values of datamnist to be assessed. For example, computing the mean of a list looks like:>>> x = [1, 2, 3] >>> PropertyStats.calc_stat(x, 'mean') # Result is 2 >>> PropertyStats.calc_stat(x, 'mean', weights=[0, 0, 1]) # Result is 3
Some the statistics functions take options (e.g., Holder means). You can pass them to the statistics functions by adding them after the x_name and two colons. For example, the 0th Holder mean would be:
>>>PropertyStats.calc_stat(x, ‘holder_mean::0’)
You can, of course, call the statistical functions directly. All take at least two arguments. The first is the datamnist being assessed and the second, optional, argument is the weights.
- static avg_dev(data_lst, weights=None)¶
Mean absolute deviation of list of element datamnist. This is computed by first calculating the mean of the list, and then computing the average absolute difference between each value and the mean. :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats
- Returns:
mean absolute deviation
- static calc_stat(data_lst, stat, weights=None)¶
Compute a property statistic
- Parameters:
data_lst (list of floats) – list of values
stat (str) –
example (should be added after the x_name and separated by two colons. For) –
would (the 2nd Holder mean) –
"holder_mean::2" (be) –
weights (list of floats) – (Optional) weights for each element in data_lst
- Returns:
float - Desired statistic
- static eigenvalues(data_lst, symm=False, sort=False)¶
Return the eigenvalues of a matrix as a numpy array :param data_lst: (matrix-like) of values :param symm: whether to assume the matrix is symmetric :param sort: wheter to sort the eigenvalues
Returns: eigenvalues
- static flatten(data_lst, weights=None)¶
Returns a flattened copy of data_lst-as a numpy array
- static geom_std_dev(data_lst, weights=None)¶
Geometric standard deviation :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats
- Returns:
geometric standard deviation
- static holder_mean(data_lst, weights=None, power=1)¶
Get Holder mean :param data_lst: (list/array) of values :param weights: (list/array) of weights :param power: (int/float/str) which holder mean to compute
Returns: Holder mean
- static inverse_mean(data_lst, weights=None)¶
Mean of the inverse of each entry :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats
- Returns:
inverse mean
- static kurtosis(data_lst, weights=None)¶
Kurtosis of a list of datamnist :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats
- Returns:
kurtosis
- static maximum(data_lst, weights=None)¶
Maximum value in a list :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: (ignored)
- Returns:
maximum value
- static mean(data_lst, weights=None)¶
Arithmetic mean of list :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats
- Returns:
mean value
- static minimum(data_lst, weights=None)¶
Minimum value in a list :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: (ignored)
- Returns:
minimum value
- static mode(data_lst, weights=None)¶
Mode of a list of datamnist. If multiple elements occur equally-frequently (or same weight, if weights are provided), this function will return the minimum of those values. :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats
- Returns:
mode
- static quantile(data_lst, weights=None, q=0.5)¶
Return a specific quantile. :param weights: not used :type weights: float :param data_lst: 1D datamnist list to be used for computing, quantiles :type data_lst: list or np.ndarray :param q: The quantile, as a fraction between 0 and 1. :type q: float
- Returns:
(float) The computed quantile of the data_lst.
- static range(data_lst, weights=None)¶
Range of a list :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: (ignored)
- Returns:
range
- static skewness(data_lst, weights=None)¶
Skewness of a list of datamnist :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats
- Returns:
shewness
- static sorted(data_lst, weights=None)¶
Returns the sorted data_lst
- static std_dev(data_lst, weights=None)¶
Standard deviation of a list of element datamnist :param data_lst: List of values to be assessed :type data_lst: list of floats :param weights: Weights for each value :type weights: list of floats
- Returns:
standard deviation
featurebox.featurizers.state.state_mapper module¶
- class featurebox.featurizers.state.state_mapper.StructurePymatgenPropMap(prop_name=None, func: Optional[Callable] = None, return_type='df', **kwargs)¶
Bases:
_StructurePymatgenPropMapGet property of pymatgen structure preprocessing. default [“density”, “volume”, “ntypesp”]
Examples
>>> tmps = StructurePymatgenPropMap() >>> tmps.fit_transform()
- Parameters:
prop_name – (str,list of str) prop name or list of prop name default [“density”, “volume”, “ntypesp”]
func – (callable or list of callable) please make sure the size of it is the same with prop_name.
featurebox.featurizers.state.statistics module¶
- class featurebox.featurizers.state.statistics.BaseCompositionFeature(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df', feature_labels_mark: Optional[str] = None)¶
Bases:
BinaryMapBaseCompositionFeature is the basis for composition data. the subclass should be re-implemented, such as:
def mix_function(self, elems:List, nums:List): w_ = np.array(nums) return w_.dot(elems)
Base class for composition feature.
- convert_dict(atoms: dict) ndarray¶
Convert atom {symbol: fraction} list to numeric features
- convert_number(atoms: List)¶
Convert atom {symbol: fraction} list to numeric features
- fit(*args, x_labels=None, **kwargs)¶
fit function in
BaseFeatureare weakened and just pass parameter.
- abstract mix_function(elems: List, nums: Union[List, ndarray])¶
- Parameters:
elems (list) – Elements in compound.
nums (list) – Number of each element.
- Returns:
descriptor
- Return type:
numpy.ndarray
- class featurebox.featurizers.state.statistics.DepartElementFeature(data_map: BinaryMap, n_composition: int, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶
Bases:
BaseCompositionFeatureGet the table of element data.
Examples
>>> from featurebox.featurizers.atom.mapper import AtomJsonMap >>> from featurebox.featurizers.state.union import UnionFeature >>> from featurebox.featurizers.state.statistics import DepartElementFeature >>> data_map = AtomJsonMap(search_tp="name",embedding_dict="ele_megnet.json", n_jobs=1) # keep this n_jobs=1 and return_type="np" >>> wa = DepartElementFeature(data_map,n_composition=2, n_jobs=1, return_type="pd") >>> comp = [{"H": 2, "Pd": 1},{"He":1, "Al":4}] >>> wa.set_feature_labels(["fea_{}".format(_) for _ in range(16)]) # 16 this the feature number of built-in "ele_megnet.json" >>> wa.fit_transform(comp) depart_fea_0_0 depart_fea_0_1 ... depart_fea_15_0 depart_fea_15_1 0 0.352363 0.561478 ... -0.270104 -0.212607 1 -0.067220 0.025758 ... -0.042185 0.080350 [2 rows x 32 columns]
Base class for composition feature.
- convert_dict(atoms: Union[dict, Composition]) ndarray¶
Convert atom {symbol: fraction} list to numeric features
- convert_number(atoms: List) ndarray¶
Convert atom {symbol: fraction} list to numeric features
- mix_function(elems: ndarray, nums=None)¶
- Parameters:
elems (list) – Elements in compound.
nums (list) – Number of each element.
- Returns:
descriptor
- Return type:
numpy.ndarray
- set_feature_labels(values)¶
Generate attribute names.
- Returns:
([str]) attribute labels.
- class featurebox.featurizers.state.statistics.ExtraMix(data_map: BinaryMap, stats: Tuple[str] = ('mean',), n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶
Bases:
BaseCompositionFeatureSee also
Base class for composition feature.
- mix_function(elems, nums)¶
- Parameters:
elems (list) – Elements in compound.
nums (list) – Number of each element.
- Returns:
descriptor
- Return type:
numpy.ndarray
- class featurebox.featurizers.state.statistics.GeometricMean(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶
Bases:
BaseCompositionFeatureSee also
Base class for composition feature.
- mix_function(elems: ndarray, nums)¶
- Parameters:
elems (list) – Elements in compound.
nums (list) – Number of each element.
- Returns:
descriptor
- Return type:
numpy.ndarray
- class featurebox.featurizers.state.statistics.HarmonicMean(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶
Bases:
BaseCompositionFeatureSee also
Base class for composition feature.
- mix_function(elems, nums)¶
- Parameters:
elems (list) – Elements in compound.
nums (list) – Number of each element.
- Returns:
descriptor
- Return type:
numpy.ndarray
- class featurebox.featurizers.state.statistics.MaxPooling(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶
Bases:
BaseCompositionFeatureSee also
Base class for composition feature.
- mix_function(elems, _)¶
- Parameters:
elems (list) – Elements in compound.
nums (list) – Number of each element.
- Returns:
descriptor
- Return type:
numpy.ndarray
- class featurebox.featurizers.state.statistics.MinPooling(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶
Bases:
BaseCompositionFeatureSee also
Base class for composition feature.
- mix_function(elems, _)¶
- Parameters:
elems (list) – Elements in compound.
nums (list) – Number of each element.
- Returns:
descriptor
- Return type:
numpy.ndarray
- class featurebox.featurizers.state.statistics.WeightedAverage(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶
Bases:
BaseCompositionFeatureExamples
>>> from featurebox.featurizers.atom.mapper import AtomTableMap, AtomJsonMap >>> data_map = AtomJsonMap(search_tp="name", n_jobs=1) >>> wa = WeightedAverage(data_map, n_jobs=1,return_type="df") >>> x3 = [{"H": 2, "Pd": 1},{"He":1,"Al":4}] >>> wa.fit_transform(x3) 0 1 2 ... 13 14 15 0 0.422068 0.360958 0.201433 ... -0.459164 -0.064783 -0.250939 1 0.007163 -0.471498 -0.072860 ... 0.206306 -0.041006 0.055843 [2 rows x 16 columns]
>>> wa.set_feature_labels(["fea_{}".format(_) for _ in range(16)]) >>> wa.fit_transform(x3) wt_ave_fea_0 wt_ave_fea_1 ... wt_ave_fea_14 wt_ave_fea_15 0 0.422068 0.360958 ... -0.064783 -0.250939 1 0.007163 -0.471498 ... -0.041006 0.055843 [2 rows x 16 columns]
Base class for composition feature.
- mix_function(elems, nums)¶
- Parameters:
elems (list) – Elements in compound.
nums (list) – Number of each element.
- Returns:
descriptor
- Return type:
numpy.ndarray
- class featurebox.featurizers.state.statistics.WeightedSum(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶
Bases:
BaseCompositionFeatureExamples
>>> from featurebox.featurizers.atom.mapper import AtomTableMap, AtomJsonMap >>> data_map = AtomTableMap(search_tp="name", n_jobs=1) >>> wa = WeightedSum(data_map, n_jobs=1,return_type="df") >>> x3 = [{"H": 2, "Pd": 1},{"He":1,"Al":4}] >>> wa.fit_transform(x3) wt_sum_1s wt_sum_2s wt_sum_2p ... wt_sum_6d wt_sum_6f wt_sum_7s 0 8320.18 11837.27 11.80 ... 0.0 0.0 0.0 1 2188.73 1513.40 986.16 ... 0.0 0.0 0.0 [2 rows x 19 columns]
Base class for composition feature.
- mix_function(elems, nums)¶
- Parameters:
elems (list) – Elements in compound.
nums (list) – Number of each element.
- Returns:
descriptor
- Return type:
numpy.ndarray
- class featurebox.featurizers.state.statistics.WeightedVariance(data_map: BinaryMap, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶
Bases:
BaseCompositionFeatureSee also
Base class for composition feature.
- mix_function(elems: ndarray, nums)¶
- Parameters:
elems (list) – Elements in compound.
nums (list) – Number of each element.
- Returns:
descriptor
- Return type:
numpy.ndarray
featurebox.featurizers.state.union module¶
- class featurebox.featurizers.state.union.PolyFeature(*, degree: Union[int, List[int]] = 3, n_jobs=1, on_errors='raise', return_type='df')¶
Bases:
BaseFeature,ABCExtension method.
Such as degree = 2 means (x1x2,x1**2,x2**2)
Examples
>>> n = np.array([[0,1,2,3,4,5],[0.422068,0.360958,0.201433,-0.459164,-0.064783,-0.250939]]).T >>> ps = pd.DataFrame(n,columns=["f1","f2"],index= ["x0","x1","x2","x3","x4","x5"]) >>> pf = PolyFeature(degree=[1,2]) >>> pf.fit_transform(n) f0^1 f1^1 f0^2 f0^1*f1^1 f1^2 0 0.0 0.422068 0.0 0.000000 0.178141 1 1.0 0.360958 1.0 0.360958 0.130291 2 2.0 0.201433 4.0 0.402866 0.040575 3 3.0 -0.459164 9.0 -1.377492 0.210832 4 4.0 -0.064783 16.0 -0.259132 0.004197 5 5.0 -0.250939 25.0 -1.254695 0.062970
- Parameters:
batch_size (int) – size of batch.
batch_calculate (bool) – batch_calculate or not.
n_jobs (int) – Parallel number.
on_errors (str) – How to handle the exceptions in a feature calculations. Can be
nan,keep,raise. When ‘nan’, return a column with np.nan. The length of column corresponding to the number of feature labs. The default is ‘raise’ which will raise up the exception.return_type (str) – Specific the return type. Can be
any,np,``array`` anddf. ‘array’ and ‘df’ force return type to np.ndarray and pd.DataFrame respectively. If ‘any’, without type conversion . Default is ‘any’
- fit_transform(X: Union[ndarray, DataFrame], y=None, **kwargs)¶
If convert takes multiple inputs, supply inputs as a list of tuples.
Copy from Mixin class for all transformers in scikit-learn. TransformerMixin
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (list) – list of case.
y (None) – deprecated.
**kwargs – Additional fit or transform parameters. feature_labels_mark: str, mark for each feature_labes. for return_type ==’pd’. x_labels: list, mark for each row. for return_type ==’pd’.
- Returns:
result data.
- Return type:
X_new
- set_feature_labels(input_features=None)¶
Generate attribute names.
- Returns:
([str]) attribute labels.
- class featurebox.featurizers.state.union.UnionFeature(comp: List[Dict], couple_data: Union[DataFrame, ndarray], couple=2, stats=('mean',), n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'df')¶
Bases:
BaseFeatureTransform method should input0 comp_index rather than entries.
Examples
>>> from featurebox.featurizers.atom.mapper import AtomTableMap, AtomJsonMap >>> data_map = AtomJsonMap(search_tp="name", n_jobs=1) >>> wa = DepartElementFeature(data_map,n_composition=2, n_jobs=1,return_type="df") >>> x3 = [{"H": 2, "Pd": 1},{"He":1,"Al":4}] >>> wa.set_feature_labels(["fea_{}".format(_) for _ in range(16)]) >>> wa.fit_transform(x3) depart_fea_0_0 depart_fea_0_1 ... depart_fea_15_0 depart_fea_15_1 0 0.352363 0.561478 ... -0.270104 -0.212607 1 -0.067220 0.025758 ... -0.042185 0.080350 [2 rows x 32 columns]
>>> couple_data = wa.fit_transform(x3) >>> uf = UnionFeature(x3,couple_data,couple=2,stats=("mean","maximum")) >>> uf.fit_transform() mean_fea_0 maximum_fea_0 ... mean_fea_15 maximum_fea_15 0 0.422068 0.360958 ... 0.021095 -0.212607 1 0.007163 -0.471498 ... 0.165278 0.080350 [2 rows x 32 columns]
>>> couple_data = wa.fit_transform(x3) >>> uf = UnionFeature(x3,couple_data,couple=2,stats=("std_dev",)) >>> uf.fit_transform() std_dev_fea_0 std_dev_fea_1 ... std_dev_fea_14 std_dev_fea_15 0 0.147867 0.583352 ... 0.182177 0.040657 1 0.065745 0.541477 ... 0.182331 0.086646 [2 rows x 16 columns]
- Parameters:
batch_size (int) – size of batch.
batch_calculate (bool) – batch_calculate or not.
n_jobs (int) – Parallel number.
on_errors (str) – How to handle the exceptions in a feature calculations. Can be
nan,keep,raise. When ‘nan’, return a column with np.nan. The length of column corresponding to the number of feature labs. The default is ‘raise’ which will raise up the exception.return_type (str) – Specific the return type. Can be
any,np,``array`` anddf. ‘array’ and ‘df’ force return type to np.ndarray and pd.DataFrame respectively. If ‘any’, without type conversion . Default is ‘any’
- convert(comp_number=0)¶
Get elemental property attributes
- Parameters:
comp – Pymatgen composition object
- Returns:
Specified property statistics of features :param comp_number:
- Return type:
all_attributes
- fit_transform(entries: Optional[List] = None) Any¶
If convert takes multiple inputs, supply inputs as a list of tuples.
Copy from Mixin class for all transformers in scikit-learn. TransformerMixin
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- Parameters:
X (list) – list of case.
y (None) – deprecated.
**kwargs – Additional fit or transform parameters. feature_labels_mark: str, mark for each feature_labes. for return_type ==’pd’. x_labels: list, mark for each row. for return_type ==’pd’.
- Returns:
result data.
- Return type:
X_new
- set_feature_labels(self_elem_data_columns_values: List)¶
Generate attribute names.
- Parameters:
self_elem_data_columns_values (List) – name
- Return type:
([str]) attribute labels.
- transform(entries: Optional[List] = None) Any¶
Transform a list of entries. Each iterable element of entries is corresponding to the parameter of
convert, Ifconverttakes n multiple inputs, the transform inputs should be a list or tuple (size n),[(p1,p2),(p1,p2),(p1,p2),…,(p1,p2),(p1,p2)]
which can be from zip` or used the built-in
transform_with_zip.- Parameters:
entries (list) – A list of entries to be featured.
- Returns:
result – features for each entry.
- Return type:
any