featurebox.featurizers.atom package¶

Submodules¶

featurebox.featurizers.atom.mapper module¶

Get pure atom properties.

Embedded data: “ele_table.csv”, “ele_megnet.json”, “ie.json”, “oe.csv”

class featurebox.featurizers.atom.mapper.AtomJsonMap(embedding_dict: Optional[Union[str, Dict]] = None, search_tp: str = 'auto', feature_labels=None, **kwargs)¶

Bases: BinaryMap

Fixed Atom json map.

Examples

>>> tmps = AtomJsonMap(search_tp="number")
>>> s = [1,76]                   #[i.specie.Z for i in structure]
>>> a = tmps.convert(s)
>>> tmps = AtomJsonMap(search_tp="name")
>>> s = [{"H": 2, }, {"Al": 1}]  #[i.species.as_dict() for i in pymatgen_structure.sites]
>>> a = tmps.convert(s)
>>>
>>> tmps = AtomJsonMap(search_tp="name")
>>> s = [[{"H": 2, }, {"Ce": 1}],[{"H": 2, }, {"Al": 1}]]
>>> a = tmps.transform(s)

Parameters:

embedding_dict –

(str,dict) Name of file or dict,element to element vector dictionary

Provides the pre-trained elemental embeddings using formation energies, which can be used to speed up the training. The embeddings are also extremely useful elemental descriptors that encode chemical similarity that may be used in other ways.

convert_dict(atoms: List[dict]) → ndarray¶: Convert atom {symbol: fraction} list to numeric features

convert_number(atoms: List[int]) → ndarray¶: convert list of number to data

class featurebox.featurizers.atom.mapper.AtomMap(n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'any', **kwargs)¶

Bases: BaseFeature

Base class for atom converter. Map the element type and weight to element data.

Parameters:

batch_size (int) – size of batch.
batch_calculate (bool) – batch_calculate or not.
n_jobs (int) – Parallel number.
on_errors (str) – How to handle the exceptions in a feature calculations. Can be nan, keep, raise. When ‘nan’, return a column with np.nan. The length of column corresponding to the number of feature labs. The default is ‘raise’ which will raise up the exception.
return_type (str) – Specific the return type. Can be any, np,``array`` and df. ‘array’ and ‘df’ force return type to np.ndarray and pd.DataFrame respectively. If ‘any’, without type conversion . Default is ‘any’

static get_csv_embeddings(data_name: str) → DataFrame¶: get csv preprocessing

static get_json_embeddings(file_name: str = 'ele_megnet.json') → Dict¶: get json preprocessing

class featurebox.featurizers.atom.mapper.AtomPymatgenPropMap(prop_name: Union[str, List[str]], func: Optional[Callable] = None, search_tp: str = 'name', **kwargs)¶

Bases: BinaryMap

Get pymatgen element preprocessing. prop_name = [ “atomic_radius”, “atomic_mass”, “number”, “max_oxidation_state”, “min_oxidation_state”, “row”, “group”, “atomic_radius_calculated”, “mendeleev_no”, “critical_temperature”, “density_of_solid”, “average_ionic_radius”, “average_cationic_radius”, “average_anionic_radius”,]

Examples

>>> tmps = AtomPymatgenPropMap(search_tp="number",prop_name=["X"])
>>> s = [1,76]
>>> a = tmps.convert(s)
>>> tmps = AtomPymatgenPropMap(search_tp="name",prop_name=["X"])
>>> s = [{"H": 2, }, {"Po": 1}]  #[i.species.as_dict() for i in pymatgen.structure.sites]
>>> a = tmps.convert(s)
>>> tmps = AtomPymatgenPropMap(search_tp="name",prop_name=["X"])
>>> s = [[{"H": 2, }, {"Po": 1}],[{"H": 2, }, {"Po": 1}]]
>>> a = tmps.transform(s)

Parameters:

prop_name – (str,list of str) prop name or list of prop name
func – (callable or list of callable) please make sure the size of it is the same with prop_name.
search_tp – (str) location method. “name” for dict “number” for int.

convert_dict(atoms: List[Dict]) → ndarray¶: Convert atom {symbol: fraction} list to numeric features

convert_number(atoms: List[int]) → ndarray¶: Convert int list to numeric features

property feature_labels¶

Generate attribute names.

Returns:: ([str]) attribute labels.

class featurebox.featurizers.atom.mapper.AtomTableMap(tablename: Optional[Union[str, ndarray, DataFrame]] = 'oe.csv', search_tp: str = 'auto', **kwargs)¶

Bases: BinaryMap

Fixed Atom embedding map. Default table is oe.csv. you can change the table yourself for different preprocessing. The table contains elemental features for 92 U elements at least. Please check all your data is int or float!!!

Such as:

Data	F0	F1	…
H	V	V	…
He	V	V	…
Li	V	V	…
Be	V	V	…
…	…	…	…

Examples

>>> tmps = AtomTableMap(search_tp="number")
>>> s = [1,76]
>>> tmps.convert(s)
array([[2.245000e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00,
        0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
        0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
        0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00,
        0.000000e+00, 0.000000e+00, 0.000000e+00],
       [2.383710e+03, 3.937715e+04, 3.783280e+03, 9.866700e+02,
        8.349720e+03, 6.978800e+02, 1.861780e+03, 1.549970e+03,
        9.784700e+02, 2.231900e+02, 2.633000e+02, 1.689800e+02,
        2.854000e+01, 0.000000e+00, 1.841000e+01, 0.000000e+00,
        0.000000e+00, 0.000000e+00, 0.000000e+00]])

>>> tmps = AtomTableMap(search_tp="name")
>>> s = [{"H": 2, }, {"Po": 1}]  #[i.species.as_dict() for i in pymatgen.structure.sites]
>>> a = tmps.convert(s)
...
>>> tmps = AtomTableMap(search_tp="name",tablename="oe.csv")
>>> s = [[{"H": 2, }, {"Po": 1}],[{"H": 2, }, {"Po": 1}]]
>>> a = tmps.transform(s)
...

>>> tmps = AtomTableMap(tablename=None)
>>> tmps = AtomTableMap(tablename="ele_table.csv")
>>> s = [{"H": 2, }, {"Pd": 1}]
>>> b = tmps.convert(s)
...

Parameters:

tablename (str,np.ndarray, pd.Dateframe) –
1. Name of table in bgnet.preprocessing.resources. if tablename is None, use the embedding “ele_table.csv”.
1. np.ndarray, search_tp = “number”.
2. pd.dataframe, search_tp = “name”
search_tp (str) – Name

convert_dict(atoms: List[Dict]) → ndarray¶: Convert atom {symbol: fraction} list to numeric features

convert_number(atoms: List[int]) → ndarray¶: Convert atom number list to numeric features

property feature_labels¶

Generate attribute names.

Returns:: ([str]) attribute labels.

static get_ele_embeddings(name='ele_table_norm.csv') → DataFrame¶: get CSV preprocessing

class featurebox.featurizers.atom.mapper.BinaryMap(search_tp: str = 'auto', weight: bool = False, **kwargs)¶

Bases: AtomMap

Base converter with 2 different search_tp.

Parameters:

search_tp – (str)
weight – (bool) , For true,the same key data are summed together.
**kwargs –

abstract convert_dict(d: List[Dict])¶: convert list of dict to data

abstract convert_number(d: List[int])¶: convert list of number to data

featurebox.featurizers.atom.mapper.get_atom_fea_name(structure: Structure) → List[dict]¶

For a structure return the list of dictionary for the site occupancy for example, Fe0.5Ni0.5 site will be returned as {“Fe”: 0.5, “Ni”: 0.5}

Parameters:: structure (Structure) – pymatgen Structure with potential site disorder
Returns:: a list of site fraction description

featurebox.featurizers.atom.mapper.get_atom_fea_number(structure: Structure) → List[int]¶

Get atom features from structure, may be overwritten.

Parameters:: structure – (Pymatgen.Structure) pymatgen structure
Returns:: List of atomic numbers

featurebox.featurizers.atom.mapper.get_ion_fea_name(structure: Structure) → List[dict]¶

For a structure return the list of dictionary for the site occupancy for example, Fe0.5Ni0.5 site will be returned as {“Fe2+”: 0.5, “Ni2+”: 0.5}

Parameters:: structure (Structure) – pymatgen Structure with potential site disorder
Returns:: a list of site fraction description

featurebox.featurizers.atom.mapper.process_atomic_orbitals(o)¶: Post-processing for dict preprocessing with “1s”, “2p”, …

featurebox.featurizers.atom.mapper.process_bool_transition_metal(tm)¶: Post-processing for bool preprocessing

featurebox.featurizers.atom.mapper.process_tuple_full_electronic_structure(full_e)¶: Post-processing for electronic_structure preprocessing ( (1,”s”,2),… )

featurebox.featurizers.atom.mapper.process_tuple_oxidation_states(ox, size=10)¶: Post-processing for tuple of float preprocessing.

featurebox.featurizers.atom.mapper.process_uni(i)¶: Post-processing for bool preprocessing General properties.