featurebox.data package¶
Data tools.
Embedded data: “ele_table.csv”, “ele_megnet.json”, “ie.json”, “oe.csv”
Submodules¶
featurebox.data.check_data module¶
- class featurebox.data.check_data.CheckElements(check_method: str = 'name', func: ~typing.Callable = <function CheckElements.<lambda>>)¶
基类:
objectCheck the element in available elements or not.
- AVAILABLE_ELE_NUMBER:
(1~84) + (89, 90, 91, 92).
- AVAILABLE_ELE_NAME:
(‘H’~’Bi’) + (‘Ac’, ‘Th’, ‘Pa’, ‘U’).
- 参数:
check_method (str) – Check by number or name of element. Optional (“name”,”number”)
func (callable) –
Processing for elements. Such as for element in pymatgen:
>>> func = lambda x: [x.Z, ] >>> func2 = lambda x: [x.name, ]
示例
>>> ce = CheckElements.from_list(check_method="name",grouped=False) >>> ce.check(["Na","Al","Ta"]) ['Na', 'Al', 'Ta'] >>> ce = CheckElements.from_list(check_method="name",grouped=True) >>> ce.check([["Na","Al"],["Na","Ta"]]) [['Na', 'Al'], ['Na', 'Ta']] >>> ce.check([["Na","Al"],["Na","Ra"],["Zn","H"]]) The 1 (st,ed,th) sample ['Na', 'Ra'] is with element out of AVAILABLE_ELE_NAME please to check_data.py for more information. [['Na', 'Al'], ['Zn', 'H']] >>> ce.passed_idx() array([0, 2], dtype=int64)
示例
>>> ce = CheckElements.from_pymatgen_structures() ...
- check(samples: List) List¶
- 参数:
samples (list) – Names or numbers, or list of pymatgen.Structure
- 返回:
result – List of filtered structures.
- 返回类型:
list
- classmethod from_list(check_method='name', grouped='False')¶
Get checker for list of name or number.
- classmethod from_pymatgen_structures()¶
Get checker for list of pymatgen.Structure.
- passed_idx() ndarray¶
The mark for all structures, return np.ndarray index.
featurebox.data.data_sep module¶
- class featurebox.data.data_sep.DataSameSep(data: Optional[Dict] = None, sep='-', sites_name='S', dup=3, prefix=None)¶
基类:
objectSettle data, dispatch data with “all” mark to each site.
Examples:¶
>>> d1 = {"Ta-S1":{"bond1":3.4,"bond2":3.5},"Co-S2":{"bond1":3.2,"bond2":3.1},"Fe-Sall":{"bond1":3.2,"bond2":3.1}} >>> dss = DataSameSep(d1) >>> dss["Ta-S1"]={"bond1":3.2,"bond2":3.5} # cover the old. >>> dss.replace({"Ta-S1":{"bond1":3.4,"bond2":3.5},"Co-S2":{"bond1":3.2,"bond2":3.1}}) # cover the old. >>> dss.replace_entry(label="Ta",site=1,entry={"bond1":3.2,"bond2":3.5}) # cover the old.
>>> dss.update({"Ta-S1":{"bond1":3.4,"bond2":3.5},"Co-S2":{"bond1":3.2,"bond2":3.1}}) # add >>> dss.update_entry(label="Co",site=0,entry={"bond1":3.2}) # add >>> dss.update_entry_kv(label="Mg",site="all",key="bond1",value=3.2) # add >>> dict_data = dss.settle() >>> pd_data = dss.settle_to_pd(sort=True) >>> print(pd_data) bond1 bond2 Co-S0 3.2 NaN Co-S2 3.2 3.1 Fe-S0 3.2 3.1 Fe-S1 3.2 3.1 Fe-S2 3.2 3.1 Mg-S0 3.2 NaN Mg-S1 3.2 NaN Mg-S2 3.2 NaN Ta-S1 3.2 3.5
Make sure the key of data are formatted by {label}-{Si or Sall} !!! and all values is dict type. The ‘S’ is the same with sites_name.
- param data:
first key are formated by {label}{sep}{Si or Sall}.
- type data:
(dict of dict)
- param sep:
default “-“.
- type sep:
(str)
- param sites_name:
default “S”.
- type sites_name:
(str)
- param dup:
default 3.
- type dup:
(int)
- param prefix:
the class prefix of one batch data.
- type prefix:
(str)
- replace(data: Dict)¶
Replace dict data.
- 参数:
data (dict) – {entry_key: entry}.
- replace_entry(label: str, site: Union[int, str], entry: Dict, prefix=None)¶
Replace entry!! This would cover the old entry.
- 参数:
label (str) – label name.
site (int) – number small than self.dup, or “all”.
entry (dict) – entry data.
prefix (str) – prefix name for batch of data.
- settle(sort=False) Dict¶
Settle data and return a formed dict.
- 参数:
sort (bool) – sort the entry keys or not.
- 返回:
data_settled – new dict.
- 返回类型:
dict
- settle_to_pd(sort=False) DataFrame¶
Settle data and return a formed pd.Dataframe.
- 参数:
sort (bool) – sort the entry keys or not.
- 返回:
data_settled – new table.
- 返回类型:
pd.Dataframe
- spilt(prefix_label_site='') Tuple¶
Try to get prefix,label,site_number.
- update(data: Dict)¶
Add dict data.
- 参数:
data (dict) – {entry_key: entry}.
- update_entry(label: str, site: Union[int, str], entry: Dict, prefix=None)¶
Add dict data to entry.
- 参数:
label (str) – label name.
site (int) – number small than self.dup, or “all”.
entry (dict) – entry data.
prefix (str) – prefix name for batch of data.
- update_entry_kv(label: str, site: Union[int, str], key: str, value: Any, prefix=None)¶
Add dict data to entry.
- 参数:
label (str) – label name.
site (int) – number small than self.dup, or “all”.
key (str) – name of property.
value (any) – value (float, int, str)
prefix (str) – prefix name for batch of data.
- update_from_pd(df: Union[DataFrame, str])¶
Read table and update to data. The table must be the formed by self.settle_to_pd function.
if df is str, try: df = pd.read_csv(“df_name”, index_col=0).T
- 参数:
df ((pd.DataFrame,str)) –
featurebox.data.mp_access module¶
- class featurebox.data.mp_access.MpAccess(api_key: str = 'Di28ZMunseR8vr56')¶
基类:
objectAPI for pymatgen database, access pymatgen to get data.
示例
>>> mpa = MpAccess("Di28ZMunseR8vr57") >>> ids = mpa.get_ids({"elements": {"$in": ["Al","O"]},'nelements': {"$lt": 2, "$gte": 1}}) number 29 >>> df = mpa.data_fetcher(mp_ids=ids, mp_props=['material_id', "cif"]) Will fetch 29 inorganic compounds from Materials Project >>> structures_list = mpa.cifs_to_structures() ...
- 参数:
api_key (str:) – pymatgen key.
- cifs_to_structures(cifs: Optional[List[str]] = None) List[Structure]¶
Get structures from cifs
- data_fetcher(mp_ids: Optional[List[str]] = None, mp_props: Optional[List[str]] = None, elasticity: bool = False) DataFrame¶
Fetch file from pymatgen.
prop_name=[‘band_gap’,’density’,”icsd_ids“‘volume’,’material_id’,’pretty_formula’,’elements’,”energy”, ‘efermi’,’e_above_hull’,’formation_energy_per_atom’,’final_energy_per_atom’,’unit_cell_formula’, ‘spacegroup’,’nelements‘“nsites”,”final_structure”,”cif”,”piezo”,”diel”]
- 参数:
mp_ids (list of str) – list of MP id of pymatgen.
mp_props (list of str) – prop_names
elasticity (bool) – obtain elasticity or not.
- 返回:
properties Table.
- 返回类型:
pandas.DataFrame
- get_ids(criteria: Optional[Dict] = None)¶
Search id by criteria.
support_property = [‘energy’, ‘energy_per_atom’, ‘volume’, ‘formation_energy_per_atom’, ‘nsites’, ‘unit_cell_formula’,’pretty_formula’, ‘is_hubbard’, ‘elements’, ‘nelements’, ‘e_above_hull’, ‘hubbards’, ‘is_compatible’, ‘spacegroup’, ‘task_ids’, ‘band_gap’, ‘density’, ‘icsd_id’, ‘icsd_ids’, ‘cif’, ‘total_magnetization’,’material_id’, ‘oxide_type’, ‘tags’, ‘elasticity’]
示例
>>> from itertools import combinations >>> name_list = ["NaCl","CaCo3"] >>> criteria = { ... 'pretty_formula': {"$in": name_list}, ... 'nelements': {"$lt": 3, "$gte": 3}, ... 'spacegroup.number': {"$in": [225]}, ... 'crystal_system': "cubic", ... 'nsites': {"$lt": 20}, ... 'formation_energy_per_atom': {"$lt": 0}, ... "elements": {"$all": "O"}, ... "piezo":{"$ne": None} ... "elements": {"$all": "O"}, ... "elements": {"$in": list(combinations(["Al", "Co", "Cr", "Cu", "Fe", 'Ni'], 5))}}
where,
"$gt">,"$gte">=,"$lt"<,"$lte"<=,"$ne"!=,"$in","$nin"(not in),"$or","$and","$not","$nor","$all"
- get_ids_from_web_table(path_file: Optional[str] = None) List[str]¶
This is a add method to read csv file download from web,the file name is ‘_Materials Project.csv’, which contains “Materials Id” columns.
featurebox.data.namesplit module¶
- class featurebox.data.namesplit.NameSplit(bracket_follow: bool = False)¶
基类:
objectSplit the name to table, return
expand.csvandfolds.csv.备注
Make sure the number below the element or (), For situation that the number of () before the (), such as “0.9(BaZr)+0.1(BaZrO3)”.
Please set the bracket_follow=True, and for this situation make sure the first number below
)have+!!!.示例
>>> from featurebox.data.namesplit import NameSplit >>> import os >>> os.chdir(r'.') >>> name = ['(Ti1.24La3)2',"((Ti1.24)2P2)1H0.2", "((Ti1.24)2)1H0.2", ... "((Ti1.24))1H0.2", "((Ti)2P2)1H0.2", "((Ti))1H0.2"] >>> NSp = NameSplit() >>> NSp.transform(name) ...
- transform(names: List[str], folds_name: str = 'folds.csv', expands_name: str = 'expands.csv')¶
- 参数:
names (list of str) – composition names list.
folds_name (str) – return name, default is ‘folds.csv’
expands_name (str) – return name, default is ‘expands.csv’
- 返回:
return tables.csv.
- 返回类型:
None