featurebox.featurizers package¶
Subpackages¶
- featurebox.featurizers.atom package
- Submodules
- featurebox.featurizers.atom.mapper module
- featurebox.featurizers.envir package
- featurebox.featurizers.state package
- Submodules
- featurebox.featurizers.state.extrastats module
PropertyStatsPropertyStats.avg_dev()PropertyStats.calc_stat()PropertyStats.eigenvalues()PropertyStats.flatten()PropertyStats.geom_std_dev()PropertyStats.holder_mean()PropertyStats.inverse_mean()PropertyStats.kurtosis()PropertyStats.maximum()PropertyStats.mean()PropertyStats.minimum()PropertyStats.mode()PropertyStats.quantile()PropertyStats.range()PropertyStats.skewness()PropertyStats.sorted()PropertyStats.std_dev()
- featurebox.featurizers.state.state_mapper module
- featurebox.featurizers.state.statistics module
- featurebox.featurizers.state.union module
Submodules¶
featurebox.featurizers.base_feature module¶
Base
- class featurebox.featurizers.base_feature.BaseFeature(n_jobs: int = 1, *, on_errors: str = 'raise', return_type: str = 'any', batch_calculate: bool = False, batch_size: int = 30)¶
基类:
objectUsing a BaseFeature Class
That means you can embed this feature directly into
BaseFeatureclass implement.class MatFeature(BaseFeature): def convert(spath, *x): ...
BaseFeatureimplementsklearn.base.BaseEstimatorandsklearn.base.TransformerMixinthat means you can use it in a scikit-learn way.feature = SomeFeature() features = feature.fit_transform(X)
备注
The
convertmethod should be rewrite to deal with single case. And thetransformandfit_transformwill be established for list of case automatically.Adding references
BaseFeaturealso provide you to retrieving proper references for a feature. The__citations__returns a list of papers that should be cited. The__authors__returns a list of people who wrote the feature. Also can be accessed from propertycitationsandcitations.These operations must be implemented for each new feature:
feature_labels- Generates a human-meaningful x_name for each of the features. Implement this as property.
which can be set by
set_feature_labelsAlso suggest to implement these two properties:
citations- Returns a list of citations in BibTeX format.authors- Returns a list of people who contributed writing a paper.
备注
None of these operations should change the state of the feature. I.e., running each method twice should no produce different results, no class attributes should be changed, Running one operation should not affect the output of another.
- 参数:
batch_size (int) – size of batch.
batch_calculate (bool) – batch_calculate or not.
n_jobs (int) – Parallel number.
on_errors (str) – How to handle the exceptions in a feature calculations. Can be
nan,keep,raise. When ‘nan’, return a column with np.nan. The length of column corresponding to the number of feature labs. The default is ‘raise’ which will raise up the exception.return_type (str) – Specific the return type. Can be
any,np,``array`` anddf. ‘array’ and ‘df’ force return type to np.ndarray and pd.DataFrame respectively. If ‘any’, without type conversion . Default is ‘any’
- property authors¶
List of implementors of the feature.
- 返回:
- (list) each element should either be a string with author x_name (e.g.,
”Anubhav Jain”) or a dictionary with required key “x_name” and other keys like “email” or “institution” (e.g., {“x_name”: “Anubhav Jain”, “email”: “ajain@lbl.gov”, “institution”: “LBNL”}).
- property citations¶
Citation(s) and reference(s) for this feature.
- 返回:
- (list) each element should be a string citation,
ideally in BibTeX format.
- convert(d)¶
Main feature function, which has to be implemented in any derived feature subclass.
备注
It cannot be passed np.array in default unless:
1. useful for bond_converter. For np.array we check the ndim and for ndim 2, or 3. we decide whether to pass them the data to
_convertertogether or separately byself.ndimattribute. Now max support 3d. due to for some functions, usingufuncin numpy is very efficient.keep the size of data and simple the
_convert.
- 参数:
d – one input data (one sample, one case),
- 返回:
new x.
- 返回类型:
new_x
- property feature_labels¶
Generate attribute names.
- 返回:
([str]) attribute labels.
- fit(*args, **kwargs)¶
fit function in
BaseFeatureare weakened and just pass parameter.
- fit_transform(X: List, y=None, **kwargs) Any¶
If convert takes multiple inputs, supply inputs as a list of tuples.
Copy from Mixin class for all transformers in scikit-learn. TransformerMixin
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
- 参数:
X (list) – list of case.
y (None) – deprecated.
**kwargs – Additional fit or transform parameters. feature_labels_mark: str, mark for each feature_labes. for return_type ==’pd’. x_labels: list, mark for each row. for return_type ==’pd’.
- 返回:
result data.
- 返回类型:
X_new
- property n_jobs¶
int Parallel number.
- Type:
n_jobs
- set_feature_labels(values: List[str])¶
Generate attribute names.
- 返回:
([str]) attribute labels.
- transform(entries: List) Any¶
Transform a list of entries. Each iterable element of entries is corresponding to the parameter of
convert, Ifconverttakes n multiple inputs, the transform inputs should be a list or tuple (size n),[(p1,p2),(p1,p2),(p1,p2),…,(p1,p2),(p1,p2)]
which can be from zip` or used the built-in
transform_with_zip.- 参数:
entries (list) – A list of entries to be featured.
- 返回:
result – features for each entry.
- 返回类型:
any
- transform_with_zip(*args) Any¶
Second transform, which convert Iterables to list and run transform.
first: p1s,p2s -> [(p1,p2),(p1,p2),(p1,p2),…,(p1,p2),(p1,p2)]
second: run self.transform
- 参数:
args (Iterable) – each of args must be Iterable.
- 返回:
result – features for each entry.
- 返回类型:
any
- featurebox.featurizers.base_feature.Converter¶
BaseFeature的别名
- class featurebox.featurizers.base_feature.ConverterCat(*args: BaseFeature, force_concatenate=False, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'any')¶
基类:
BaseFeaturePack the converters in to one unified approach. The same type Converter would merged and different would order to run. Thus, keeping the same type is next to each other! such as A(),A(),B(),B().
示例
>>> tmps = ConverterCat( ... AtomEmbeddingMap(), ... AtomEmbeddingMap("ie.json") ... AtomTableMap(search_tp="name")) >>> tmp.convert(x) >>> tmp.tranmform(xs)
- 参数:
args (Converter) – List of Converter
- convert(d)¶
convert and concatenate.
- static sums(args)¶
SUM
- class featurebox.featurizers.base_feature.ConverterSequence(*args: BaseFeature, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'any')¶
基类:
BaseFeaturePack the converters in to one sequentially executed assembly approach.
input -> convert1 -> temp -> convert2 -> temp -> convert3 -> output
备注
There is no error checking, please make sure the
tempcould be passed manually !!! There is no error checking, please make sure thetempcould be passed manually !!! There is no error checking, please make sure thetempcould be passed manually !!!示例
>>> tmps = ConverterCat( ... AtomEmbeddingMap(), ... DummyConverter() >>> tmp.convert(x)
- 参数:
args (Converter) – List of Converter
- convert(d)¶
convert batched
- class featurebox.featurizers.base_feature.DummyConverter(n_jobs: int = 1, *, on_errors: str = 'raise', return_type: str = 'any', batch_calculate: bool = False, batch_size: int = 30)¶
基类:
BaseFeatureDummy converter as a placeholder, Do nothing.
- 参数:
batch_size (int) – size of batch.
batch_calculate (bool) – batch_calculate or not.
n_jobs (int) – Parallel number.
on_errors (str) – How to handle the exceptions in a feature calculations. Can be
nan,keep,raise. When ‘nan’, return a column with np.nan. The length of column corresponding to the number of feature labs. The default is ‘raise’ which will raise up the exception.return_type (str) – Specific the return type. Can be
any,np,``array`` anddf. ‘array’ and ‘df’ force return type to np.ndarray and pd.DataFrame respectively. If ‘any’, without type conversion . Default is ‘any’
- convert(d) ndarray¶
Dummy convert, does nothing to input.
- 参数:
d (Any) – input object
Returns: d
featurebox.featurizers.batch_feature module¶
- class featurebox.featurizers.batch_feature.BatchFeature(data_type: str = 'compositions', user_convert: Optional[BaseFeature] = None, n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'any', batch_calculate: bool = False, batch_size: int = 30)¶
基类:
BaseFeatureScript for generate batch_data, could be copied and user-defined.
- 参数:
data_type (str) – Predefined name [“elements”, “compositions”, “structures”]
user_convert (BatchFeature) – which contain convert method.
- convert(d)¶
Main feature function, which has to be implemented in any derived feature subclass.
备注
It cannot be passed np.array in default unless:
1. useful for bond_converter. For np.array we check the ndim and for ndim 2, or 3. we decide whether to pass them the data to
_convertertogether or separately byself.ndimattribute. Now max support 3d. due to for some functions, usingufuncin numpy is very efficient.keep the size of data and simple the
_convert.
- 参数:
d – one input data (one sample, one case),
- 返回:
new x.
- 返回类型:
new_x
- property feature_labels¶
Generate attribute names.
- 返回:
([str]) attribute labels.
- set_feature_labels(values: List[str])¶
Generate attribute names.
- 返回:
([str]) attribute labels.