featurebox.featurizers.atom package¶
Submodules¶
featurebox.featurizers.atom.mapper module¶
Get pure atom properties.
Embedded data: “ele_table.csv”, “ele_megnet.json”, “ie.json”, “oe.csv”
- class featurebox.featurizers.atom.mapper.AtomJsonMap(embedding_dict: Optional[Union[str, Dict]] = None, search_tp: str = 'auto', feature_labels=None, **kwargs)¶
Bases:
BinaryMapFixed Atom json map.
Examples
>>> tmps = AtomJsonMap(search_tp="number") >>> s = [1,76] #[i.specie.Z for i in structure] >>> a = tmps.convert(s) >>> tmps = AtomJsonMap(search_tp="name") >>> s = [{"H": 2, }, {"Al": 1}] #[i.species.as_dict() for i in pymatgen_structure.sites] >>> a = tmps.convert(s) >>> >>> tmps = AtomJsonMap(search_tp="name") >>> s = [[{"H": 2, }, {"Ce": 1}],[{"H": 2, }, {"Al": 1}]] >>> a = tmps.transform(s)
- Parameters:
embedding_dict –
(str,dict) Name of file or dict,element to element vector dictionary
Provides the pre-trained elemental embeddings using formation energies, which can be used to speed up the training. The embeddings are also extremely useful elemental descriptors that encode chemical similarity that may be used in other ways.
- convert_dict(atoms: List[dict]) ndarray¶
Convert atom {symbol: fraction} list to numeric features
- convert_number(atoms: List[int]) ndarray¶
convert list of number to data
- class featurebox.featurizers.atom.mapper.AtomMap(n_jobs: int = 1, on_errors: str = 'raise', return_type: str = 'any', **kwargs)¶
Bases:
BaseFeatureBase class for atom converter. Map the element type and weight to element data.
- Parameters:
batch_size (int) – size of batch.
batch_calculate (bool) – batch_calculate or not.
n_jobs (int) – Parallel number.
on_errors (str) – How to handle the exceptions in a feature calculations. Can be
nan,keep,raise. When ‘nan’, return a column with np.nan. The length of column corresponding to the number of feature labs. The default is ‘raise’ which will raise up the exception.return_type (str) – Specific the return type. Can be
any,np,``array`` anddf. ‘array’ and ‘df’ force return type to np.ndarray and pd.DataFrame respectively. If ‘any’, without type conversion . Default is ‘any’
- static get_csv_embeddings(data_name: str) DataFrame¶
get csv preprocessing
- static get_json_embeddings(file_name: str = 'ele_megnet.json') Dict¶
get json preprocessing
- class featurebox.featurizers.atom.mapper.AtomPymatgenPropMap(prop_name: Union[str, List[str]], func: Optional[Callable] = None, search_tp: str = 'name', **kwargs)¶
Bases:
BinaryMapGet pymatgen element preprocessing. prop_name = [ “atomic_radius”, “atomic_mass”, “number”, “max_oxidation_state”, “min_oxidation_state”, “row”, “group”, “atomic_radius_calculated”, “mendeleev_no”, “critical_temperature”, “density_of_solid”, “average_ionic_radius”, “average_cationic_radius”, “average_anionic_radius”,]
Examples
>>> tmps = AtomPymatgenPropMap(search_tp="number",prop_name=["X"]) >>> s = [1,76] >>> a = tmps.convert(s) >>> tmps = AtomPymatgenPropMap(search_tp="name",prop_name=["X"]) >>> s = [{"H": 2, }, {"Po": 1}] #[i.species.as_dict() for i in pymatgen.structure.sites] >>> a = tmps.convert(s) >>> tmps = AtomPymatgenPropMap(search_tp="name",prop_name=["X"]) >>> s = [[{"H": 2, }, {"Po": 1}],[{"H": 2, }, {"Po": 1}]] >>> a = tmps.transform(s)
- Parameters:
prop_name – (str,list of str) prop name or list of prop name
func – (callable or list of callable) please make sure the size of it is the same with prop_name.
search_tp – (str) location method. “name” for dict “number” for int.
- convert_dict(atoms: List[Dict]) ndarray¶
Convert atom {symbol: fraction} list to numeric features
- convert_number(atoms: List[int]) ndarray¶
Convert int list to numeric features
- property feature_labels¶
Generate attribute names.
- Returns:
([str]) attribute labels.
- class featurebox.featurizers.atom.mapper.AtomTableMap(tablename: Optional[Union[str, ndarray, DataFrame]] = 'oe.csv', search_tp: str = 'auto', **kwargs)¶
Bases:
BinaryMapFixed Atom embedding map. Default table is oe.csv. you can change the table yourself for different preprocessing. The table contains elemental features for 92 U elements at least. Please check all your data is int or float!!!
Such as:
Data
F0
F1
…
H
V
V
…
He
V
V
…
Li
V
V
…
Be
V
V
…
…
…
…
…
Examples
>>> tmps = AtomTableMap(search_tp="number") >>> s = [1,76] >>> tmps.convert(s) array([[2.245000e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00], [2.383710e+03, 3.937715e+04, 3.783280e+03, 9.866700e+02, 8.349720e+03, 6.978800e+02, 1.861780e+03, 1.549970e+03, 9.784700e+02, 2.231900e+02, 2.633000e+02, 1.689800e+02, 2.854000e+01, 0.000000e+00, 1.841000e+01, 0.000000e+00, 0.000000e+00, 0.000000e+00, 0.000000e+00]])
>>> tmps = AtomTableMap(search_tp="name") >>> s = [{"H": 2, }, {"Po": 1}] #[i.species.as_dict() for i in pymatgen.structure.sites] >>> a = tmps.convert(s) ... >>> tmps = AtomTableMap(search_tp="name",tablename="oe.csv") >>> s = [[{"H": 2, }, {"Po": 1}],[{"H": 2, }, {"Po": 1}]] >>> a = tmps.transform(s) ...
>>> tmps = AtomTableMap(tablename=None) >>> tmps = AtomTableMap(tablename="ele_table.csv") >>> s = [{"H": 2, }, {"Pd": 1}] >>> b = tmps.convert(s) ...
- Parameters:
tablename (str,np.ndarray, pd.Dateframe) –
1. Name of table in bgnet.preprocessing.resources. if tablename is None, use the embedding “ele_table.csv”.
np.ndarray, search_tp = “number”.
pd.dataframe, search_tp = “name”
search_tp (str) – Name
- convert_dict(atoms: List[Dict]) ndarray¶
Convert atom {symbol: fraction} list to numeric features
- convert_number(atoms: List[int]) ndarray¶
Convert atom number list to numeric features
- property feature_labels¶
Generate attribute names.
- Returns:
([str]) attribute labels.
- static get_ele_embeddings(name='ele_table_norm.csv') DataFrame¶
get CSV preprocessing
- class featurebox.featurizers.atom.mapper.BinaryMap(search_tp: str = 'auto', weight: bool = False, **kwargs)¶
Bases:
AtomMapBase converter with 2 different search_tp.
- Parameters:
search_tp – (str)
weight – (bool) , For true,the same key data are summed together.
**kwargs –
- abstract convert_dict(d: List[Dict])¶
convert list of dict to data
- abstract convert_number(d: List[int])¶
convert list of number to data
- featurebox.featurizers.atom.mapper.get_atom_fea_name(structure: Structure) List[dict]¶
For a structure return the list of dictionary for the site occupancy for example, Fe0.5Ni0.5 site will be returned as {“Fe”: 0.5, “Ni”: 0.5}
- Parameters:
structure (Structure) – pymatgen Structure with potential site disorder
- Returns:
a list of site fraction description
- featurebox.featurizers.atom.mapper.get_atom_fea_number(structure: Structure) List[int]¶
Get atom features from structure, may be overwritten.
- Parameters:
structure – (Pymatgen.Structure) pymatgen structure
- Returns:
List of atomic numbers
- featurebox.featurizers.atom.mapper.get_ion_fea_name(structure: Structure) List[dict]¶
For a structure return the list of dictionary for the site occupancy for example, Fe0.5Ni0.5 site will be returned as {“Fe2+”: 0.5, “Ni2+”: 0.5}
- Parameters:
structure (Structure) – pymatgen Structure with potential site disorder
- Returns:
a list of site fraction description
- featurebox.featurizers.atom.mapper.process_atomic_orbitals(o)¶
Post-processing for dict preprocessing with “1s”, “2p”, …
- featurebox.featurizers.atom.mapper.process_bool_transition_metal(tm)¶
Post-processing for bool preprocessing
- featurebox.featurizers.atom.mapper.process_tuple_full_electronic_structure(full_e)¶
Post-processing for electronic_structure preprocessing ( (1,”s”,2),… )
- featurebox.featurizers.atom.mapper.process_tuple_oxidation_states(ox, size=10)¶
Post-processing for tuple of float preprocessing.
- featurebox.featurizers.atom.mapper.process_uni(i)¶
Post-processing for bool preprocessing General properties.