featurebox.data package¶
Data tools.
Embedded data: “ele_table.csv”, “ele_megnet.json”, “ie.json”, “oe.csv”
Submodules¶
featurebox.data.check_data module¶
- class featurebox.data.check_data.CheckElements(check_method: ~typing.Union[~typing.List[str], str] = 'name', func: ~typing.Callable = <function CheckElements.<lambda>>)¶
Bases:
objectCheck the element in available elements or not.
- AVAILABLE_ELE_NUMBER:
(1~84) + (89, 90, 91, 92).
- AVAILABLE_ELE_NAME:
(‘H’~’Bi’) + (‘Ac’, ‘Th’, ‘Pa’, ‘U’).
- Parameters:
check_method (str) – Check by number or name of element. Optional (“name”,”number”)
func (callable) –
Processing for elements. Such as for element in pymatgen:
>>> func = lambda x: [x.Z, ] >>> func2 = lambda x: [x.name, ]
Examples
>>> ce = CheckElements.from_list(check_method="name") >>> ce.check(["Na","Al","Ta"]) ['Na', 'Al', 'Ta'] >>> ce = CheckElements.from_list(check_method="name") >>> ce.check([["Na","Al"],["Na","Ta"]]) [['Na', 'Al'], ['Na', 'Ta']] >>> ce.check([["Na","Al"],["Na","Ra"],["Zn","H"]]) The 1 (st,ed,th) sample ['Na', 'Ra'] is with element out of AVAILABLE_ELE_NAME please to check_data.py for more information. [['Na', 'Al'], ['Zn', 'H']] >>> ce.passed_idx() array([0, 2], dtype=int64)
Examples
>>> ce = CheckElements.from_pymatgen_structures() ...
- check(samples: List) List¶
- Parameters:
samples (list) – Names or numbers, or list of pymatgen.Structure
- Returns:
result – List of filtered structures.
- Return type:
list
- classmethod from_list(check_method='name', grouped='False')¶
Get checker for list of name or number.
- classmethod from_pymatgen_structures()¶
Get checker for list of pymatgen.Structure.
- passed_idx() ndarray¶
The mark for all structures, return np.ndarray index.
featurebox.data.data_sep module¶
- class featurebox.data.data_sep.DataSameSep(data: Optional[Dict] = None, sep='-', sites_name='S', dup=3, prefix=None)¶
Bases:
objectSettle data, dispatch data with “all” mark to each site. Make sure the values of dict are Immutable type,such as float,init. Otherwise, the stored data would change with the input data, even if later than the call of this class/function.
Examples:¶
>>> d1 = {"Ta-S1":{"bond1":3.4,"bond2":3.5},"Co-S2":{"bond1":3.2,"bond2":3.1},"Fe-Sall":{"bond1":3.2,"bond2":3.1}} >>> dss = DataSameSep(d1) >>> dss["Ta-S1"]={"bond1":3.2,"bond2":3.5} # cover the old. >>> dss.replace({"Ta-S1":{"bond1":3.4,"bond2":3.5},"Co-S2":{"bond1":3.2,"bond2":3.1}}) # cover the old. >>> dss.replace_entry(label="Ta",site=1,entry={"bond1":3.2,"bond2":3.5}) # cover the old.
>>> dss.update({"Ta-S1":{"bond1":3.4,"bond2":3.5},"Co-S2":{"bond1":3.2,"bond2":3.1}}) # add >>> dss.update_entry(label="Co",site=0,entry={"bond1":3.2}) # add >>> dss.update_entry_kv(label="Mg",site="all",key="bond1",value=3.2) # add >>> dict_data = dss.settle() >>> pd_data = dss.settle_to_pd(sort=True) >>> print(pd_data) bond1 bond2 Co-S0 3.2 NaN Co-S2 3.2 3.1 Fe-S0 3.2 3.1 Fe-S1 3.2 3.1 Fe-S2 3.2 3.1 Mg-S0 3.2 NaN Mg-S1 3.2 NaN Mg-S2 3.2 NaN Ta-S1 3.2 3.5
Make sure the key of data are formatted by {label}-{Si or Sall} !!! and all values is dict type. The ‘S’ is the same with sites_name.
- param data:
first key are formated by {label}{sep}{Si or Sall}.
- type data:
(dict of dict)
- param sep:
default “-“.
- type sep:
(str)
- param sites_name:
default “S”.
- type sites_name:
(str)
- param dup:
default 3.
- type dup:
(int)
- param prefix:
the class prefix of one batch data.
- type prefix:
(str)
- replace(data: Dict)¶
Replace dict data.
- Parameters:
data (dict) – {entry_key: entry}.
- replace_entry(label: str, site: Union[int, str], entry: Dict, prefix=None)¶
Replace entry!! This would cover the old entry.
- Parameters:
label (str) – label name.
site (int) – number small than self.dup, or “all”.
entry (dict) – entry data.
prefix (str) – prefix name for batch of data.
- settle(sort=False) Dict¶
Settle data and return a formed dict.
- Parameters:
sort (bool) – sort the entry keys or not.
- Returns:
data_settled – new dict.
- Return type:
dict
- settle_to_pd(sort=False) DataFrame¶
Settle data and return a formed pd.Dataframe.
- Parameters:
sort (bool) – sort the entry keys or not.
- Returns:
data_settled – new table.
- Return type:
pd.Dataframe
- spilt(prefix_label_site='') Tuple¶
Try to get prefix,label,site_number.
- update(data: Dict)¶
Add dict data.
- Parameters:
data (dict) – {entry_key: entry}.
- update_entry(label: str, site: Union[int, str], entry: Dict, prefix=None)¶
Add dict data to entry.
- Parameters:
label (str) – label name.
site (int) – number small than self.dup, or “all”.
entry (dict) – entry data.
prefix (str) – prefix name for batch of data.
- update_entry_kv(label: str, site: Union[int, str], key: str, value: Any, prefix=None)¶
Add dict data to entry.
- Parameters:
label (str) – label name.
site (int) – number small than self.dup, or “all”.
key (str) – name of property.
value (any) – value (float, int, str)
prefix (str) – prefix name for batch of data.
- update_from_pd(df: Union[DataFrame, str])¶
Read table and update to data. The table must be the formed by self.settle_to_pd function.
if df is str, try: df = pd.read_csv(“df_name”, index_col=0).T
- Parameters:
df ((pd.DataFrame,str)) –
featurebox.data.mp_access module¶
- class featurebox.data.mp_access.MpAccess(api_key: str = 'Di28ZMunseR8vr46')¶
Bases:
objectAPI for pymatgen database, access pymatgen to get data.
Examples
>>> mpa = MpAccess("Di28ZMunseR8vr57") # change yourself key. >>> ids = mpa.get_ids({"elements": {"$in": ["Al","O"]},'nelements': {"$lt": 2, "$gte": 1}}) number 29 >>> df = mpa.data_fetcher(mp_ids=ids, mp_props=['material_id', "cif"]) Will fetch 29 inorganic compounds from Materials Project >>> structures_list = mpa.cifs_to_structures() ...
- Parameters:
api_key (str:) – pymatgen key.
- cifs_to_structures(cifs: Optional[List[str]] = None) List[Structure]¶
Get structures from cifs
- data_fetcher(mp_ids: Optional[List[str]] = None, mp_props: Optional[List[str]] = None, elasticity: bool = False) DataFrame¶
Fetch file from pymatgen.
prop_name=[‘band_gap’,’density’,”icsd_ids“‘volume’,’material_id’,’pretty_formula’,’elements’,”energy”, ‘efermi’,’e_above_hull’,’formation_energy_per_atom’,’final_energy_per_atom’,’unit_cell_formula’, ‘spacegroup’,’nelements‘“nsites”,”final_structure”,”cif”,”piezo”,”diel”]
- Parameters:
mp_ids (list of str) – list of MP id of pymatgen.
mp_props (list of str) – prop_names
elasticity (bool) – obtain elasticity or not.
- Returns:
properties Table.
- Return type:
pandas.DataFrame
- get_ids(criteria: Optional[Dict] = None)¶
Search id by criteria.
support_property = [‘energy’, ‘energy_per_atom’, ‘volume’, ‘formation_energy_per_atom’, ‘nsites’, ‘unit_cell_formula’,’pretty_formula’, ‘is_hubbard’, ‘elements’, ‘nelements’, ‘e_above_hull’, ‘hubbards’, ‘is_compatible’, ‘spacegroup’, ‘task_ids’, ‘band_gap’, ‘density’, ‘icsd_id’, ‘icsd_ids’, ‘cif’, ‘total_magnetization’,’material_id’, ‘oxide_type’, ‘tags’, ‘elasticity’]
Examples
>>> from itertools import combinations >>> name_list = ["NaCl","CaCo3"] >>> criteria = { ... 'pretty_formula': {"$in": name_list}, ... 'nelements': {"$lt": 3, "$gte": 3}, ... 'spacegroup.number': {"$in": [225]}, ... 'crystal_system': "cubic", ... 'nsites': {"$lt": 20}, ... 'formation_energy_per_atom': {"$lt": 0}, ... # "elements": {"$all": "O"}, ... # "piezo":{"$ne": None} ... # "elements": {"$all": "O"}, ... "elements": {"$in": list(combinations(["Al", "Co", "Cr", "Cu", "Fe", 'Ni'], 5))}}
where,
"$gt">,"$gte">=,"$lt"<,"$lte"<=,"$ne"!=,"$in","$nin"(not in),"$or","$and","$not","$nor","$all"
- get_ids_from_web_table(path_file: Optional[str] = None) List[str]¶
This is method to read csv file download from web,the file name is ‘_Materials Project.csv’, which contains “Materials Id” columns.