featurebox.featurizers.atom package

Submodules

featurebox.featurizers.atom.mapper module

Get pure atom properties.

Embedded data: “ele_table.csv”, “ele_megnet.json”, “ie.json”, “oe.csv”

class featurebox.featurizers.atom.mapper.AtomJsonMap(embedding_dict: Optional[Union[str, Dict]] = None, search_tp: str = 'name', **kwargs)

基类:BinaryMap

Fixed Atom json map.

示例

>>> tmps = AtomJsonMap(search_tp="number")
>>> s = [1,76]                   #[i.specie.Z for i in structure]
>>> a = tmps.convert(s)
>>> tmps = AtomJsonMap(search_tp="name")
>>> s = [{"H": 2, }, {"Al": 1}]  #[i.species.as_dict() for i in pymatgen_structure.sites]
>>> a = tmps.convert(s)
>>>
>>> tmps = AtomJsonMap(search_tp="name")
>>> s = [[{"H": 2, }, {"Ce": 1}],[{"H": 2, }, {"Al": 1}]]
>>> a = tmps.transform(s)
参数:

embedding_dict

(str,dict) Name of file or dict,element to element vector dictionary

Provides the pre-trained elemental embeddings using formation energies, which can be used to speed up the training. The embeddings are also extremely useful elemental descriptors that encode chemical similarity that may be used in other ways.

convert_dict(atoms: List[dict]) ndarray

Convert atom {symbol: fraction} list to numeric features

convert_number(atoms: List[int]) ndarray

convert list of number to data

class featurebox.featurizers.atom.mapper.AtomMap(n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'any')

基类:BaseFeature

Base class for atom converter. Map the element type and weight to element data.

参数:
  • batch_size (int) – size of batch.

  • batch_calculate (bool) – batch_calculate or not.

  • n_jobs (int) – Parallel number.

  • on_errors (str) – How to handle the exceptions in a feature calculations. Can be nan, keep, raise. When ‘nan’, return a column with np.nan. The length of column corresponding to the number of feature labs. The default is ‘raise’ which will raise up the exception.

  • return_type (str) – Specific the return type. Can be any, np,``array`` and df. ‘array’ and ‘df’ force return type to np.ndarray and pd.DataFrame respectively. If ‘any’, without type conversion . Default is ‘any’

static get_csv_embeddings(data_name: str) DataFrame

get csv preprocessing

static get_json_embeddings(file_name: str = 'ele_megnet.json') Dict

get json preprocessing

class featurebox.featurizers.atom.mapper.AtomPymatgenPropMap(prop_name: Union[str, List[str]], func: Optional[Callable] = None, search_tp: str = 'name', **kwargs)

基类:BinaryMap

Get pymatgen element preprocessing. prop_name = [ “atomic_radius”, “atomic_mass”, “number”, “max_oxidation_state”, “min_oxidation_state”, “row”, “group”, “atomic_radius_calculated”, “mendeleev_no”, “critical_temperature”, “density_of_solid”, “average_ionic_radius”, “average_cationic_radius”, “average_anionic_radius”,]

示例

>>> tmps = AtomPymatgenPropMap(search_tp="number",prop_name=["X"])
>>> s = [1,76]
>>> a = tmps.convert(s)
>>> tmps = AtomPymatgenPropMap(search_tp="name",prop_name=["X"])
>>> s = [{"H": 2, }, {"Po": 1}]  #[i.species.as_dict() for i in pymatgen.structure.sites]
>>> a = tmps.convert(s)
>>> tmps = AtomPymatgenPropMap(search_tp="name",prop_name=["X"])
>>> s = [[{"H": 2, }, {"Po": 1}],[{"H": 2, }, {"Po": 1}]]
>>> a = tmps.transform(s)
参数:
  • prop_name – (str,list of str) prop name or list of prop name

  • func – (callable or list of callable) please make sure the size of it is the same with prop_name.

  • search_tp – (str) location method. “name” for dict “number” for int.

convert_dict(atoms: List[Dict]) ndarray

Convert atom {symbol: fraction} list to numeric features

convert_number(atoms: List[int]) ndarray

Convert int list to numeric features

property feature_labels

Generate attribute names.

返回:

([str]) attribute labels.

class featurebox.featurizers.atom.mapper.AtomTableMap(tablename: Union[str, ndarray, DataFrame] = 'oe.csv', search_tp: str = 'name', **kwargs)

基类:BinaryMap

Fixed Atom embedding map. Default table is oe.csv. you can change the table yourself for different preprocessing. The table contains elemental features for 92 U elements at least. Please check all your data is int or float!!!

Such as:

Data

F0

F1

H

V

V

He

V

V

Li

V

V

Be

V

V

示例

>>> tmps = AtomTableMap(search_tp="number")
>>> s = [1,76]
>>> tmps.convert(s)
array([[2.245000e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00,
        0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
        0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
        0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
        0.000000e+00, 0.000000e+00, 0.000000e+00],
       [2.383710e+03, 3.937715e+04, 3.783280e+03, 9.866700e+02,
        8.349720e+03, 6.978800e+02, 1.861780e+03, 1.549970e+03,
        9.784700e+02, 2.231900e+02, 2.633000e+02, 1.689800e+02,
        2.854000e+01, 0.000000e+00, 1.841000e+01, 0.000000e+00,
        0.000000e+00, 0.000000e+00, 0.000000e+00]])
>>> tmps = AtomTableMap(search_tp="name")
>>> s = [{"H": 2, }, {"Po": 1}]  #[i.species.as_dict() for i in pymatgen.structure.sites]
>>> a = tmps.convert(s)
...
>>> tmps = AtomTableMap(search_tp="name",tablename="oe.csv")
>>> s = [[{"H": 2, }, {"Po": 1}],[{"H": 2, }, {"Po": 1}]]
>>> a = tmps.transform(s)
...
>>> tmps = AtomTableMap(tablename=None)
>>> tmps = AtomTableMap(tablename="ele_table.csv")
>>> s = [{"H": 2, }, {"Pd": 1}]
>>> b = tmps.convert(s)
...
参数:
  • tablename (str,np.ndarray, pd.Dateframe) –

    1. Name of table in bgnet.preprocessing.resources. if tablename is None, use the embedding “ele_table.csv”.

    1. np.ndarray, search_tp = “number”.

    2. pd.dataframe, search_tp = “name”

  • search_tp (str) – Name

convert_dict(atoms: List[Dict]) ndarray

Convert atom {symbol: fraction} list to numeric features

convert_number(atoms: List[int]) ndarray

Convert atom number list to numeric features

property feature_labels

Generate attribute names.

返回:

([str]) attribute labels.

static get_ele_embeddings(name='ele_table_norm.csv') DataFrame

get CSV preprocessing

class featurebox.featurizers.atom.mapper.BinaryMap(search_tp: str = 'number', weight: bool = False, **kwargs)

基类:AtomMap

Base converter with 2 different search_tp.

参数:
  • search_tp – (str)

  • weight – (bool) , For true,the same key data are summed together.

  • **kwargs

abstract convert_dict(d: List[Dict])

convert list of dict to data

abstract convert_number(d: List[int])

convert list of number to data

featurebox.featurizers.atom.mapper.get_atom_fea_name(structure: Structure) List[dict]

For a structure return the list of dictionary for the site occupancy for example, Fe0.5Ni0.5 site will be returned as {“Fe”: 0.5, “Ni”: 0.5}

参数:

structure (Structure) – pymatgen Structure with potential site disorder

返回:

a list of site fraction description

featurebox.featurizers.atom.mapper.get_atom_fea_number(structure: Structure) List[int]

Get atom features from structure, may be overwritten.

参数:

structure – (Pymatgen.Structure) pymatgen structure

返回:

List of atomic numbers

featurebox.featurizers.atom.mapper.get_ion_fea_name(structure: Structure) List[dict]

For a structure return the list of dictionary for the site occupancy for example, Fe0.5Ni0.5 site will be returned as {“Fe2+”: 0.5, “Ni2+”: 0.5}

参数:

structure (Structure) – pymatgen Structure with potential site disorder

返回:

a list of site fraction description

featurebox.featurizers.atom.mapper.process_atomic_orbitals(o)

Post-processing for dict preprocessing with “1s”, “2p”, …

featurebox.featurizers.atom.mapper.process_bool_transition_metal(tm)

Post-processing for bool preprocessing

featurebox.featurizers.atom.mapper.process_tuple_full_electronic_structure(full_e)

Post-processing for electronic_structure preprocessing ( (1,”s”,2),… )

featurebox.featurizers.atom.mapper.process_tuple_oxidation_states(ox, size=10)

Post-processing for tuple of float preprocessing.

featurebox.featurizers.atom.mapper.process_uni(i)

Post-processing for bool preprocessing General properties.