featurebox.data package¶

Data tools.

Embedded data: “ele_table.csv”, “ele_megnet.json”, “ie.json”, “oe.csv”

Submodules¶

featurebox.data.check_data module¶

class featurebox.data.check_data.CheckElements(check_method: str = 'name', func: ~typing.Callable = <function CheckElements.<lambda>>)¶

基类：object

Check the element in available elements or not.

AVAILABLE_ELE_NUMBER:: (1~84) + (89, 90, 91, 92).
AVAILABLE_ELE_NAME:: (‘H’~’Bi’) + (‘Ac’, ‘Th’, ‘Pa’, ‘U’).

参数:

check_method (str) – Check by number or name of element. Optional (“name”,”number”)

func (callable) –

Processing for elements. Such as for element in pymatgen:

>>> func = lambda x: [x.Z, ]
>>> func2 = lambda x: [x.name, ]

示例

>>> ce = CheckElements.from_list(check_method="name",grouped=False)
>>> ce.check(["Na","Al","Ta"])
['Na', 'Al', 'Ta']
>>> ce = CheckElements.from_list(check_method="name",grouped=True)
>>> ce.check([["Na","Al"],["Na","Ta"]])
[['Na', 'Al'], ['Na', 'Ta']]
>>> ce.check([["Na","Al"],["Na","Ra"],["Zn","H"]])
The 1 (st,ed,th) sample ['Na', 'Ra'] is with element out of AVAILABLE_ELE_NAME
 please to check_data.py for more information.
[['Na', 'Al'], ['Zn', 'H']]
>>> ce.passed_idx()
array([0, 2], dtype=int64)

示例

>>> ce = CheckElements.from_pymatgen_structures()
...

check(samples: List) → List¶

参数:: samples (list) – Names or numbers, or list of pymatgen.Structure
返回:: result – List of filtered structures.
返回类型:: list

classmethod from_list(check_method='name', grouped='False')¶: Get checker for list of name or number.

classmethod from_pymatgen_structures()¶: Get checker for list of pymatgen.Structure.

passed_idx() → ndarray¶: The mark for all structures, return np.ndarray index.

featurebox.data.data_sep module¶

class featurebox.data.data_sep.DataSameSep(data: Optional[Dict] = None, sep='-', sites_name='S', dup=3, prefix=None)¶

基类：object

Settle data, dispatch data with “all” mark to each site.

Examples:¶

>>> d1 = {"Ta-S1":{"bond1":3.4,"bond2":3.5},"Co-S2":{"bond1":3.2,"bond2":3.1},"Fe-Sall":{"bond1":3.2,"bond2":3.1}}
>>> dss = DataSameSep(d1)
>>> dss["Ta-S1"]={"bond1":3.2,"bond2":3.5} # cover the old.
>>> dss.replace({"Ta-S1":{"bond1":3.4,"bond2":3.5},"Co-S2":{"bond1":3.2,"bond2":3.1}}) # cover the old.
>>> dss.replace_entry(label="Ta",site=1,entry={"bond1":3.2,"bond2":3.5}) # cover the old.

>>> dss.update({"Ta-S1":{"bond1":3.4,"bond2":3.5},"Co-S2":{"bond1":3.2,"bond2":3.1}}) # add
>>> dss.update_entry(label="Co",site=0,entry={"bond1":3.2}) # add
>>> dss.update_entry_kv(label="Mg",site="all",key="bond1",value=3.2) # add
>>> dict_data = dss.settle()
>>> pd_data = dss.settle_to_pd(sort=True)
>>> print(pd_data)
       bond1  bond2
Co-S0    3.2    NaN
Co-S2    3.2    3.1
Fe-S0    3.2    3.1
Fe-S1    3.2    3.1
Fe-S2    3.2    3.1
Mg-S0    3.2    NaN
Mg-S1    3.2    NaN
Mg-S2    3.2    NaN
Ta-S1    3.2    3.5

Make sure the key of data are formatted by {label}-{Si or Sall} !!! and all values is dict type. The ‘S’ is the same with sites_name.

param data:: first key are formated by {label}{sep}{Si or Sall}.
type data:: (dict of dict)
param sep:: default “-“.
type sep:: (str)
param sites_name:: default “S”.
type sites_name:: (str)
param dup:: default 3.
type dup:: (int)
param prefix:: the class prefix of one batch data.
type prefix:: (str)

replace(data: Dict)¶

Replace dict data.

参数:: data (dict) – {entry_key: entry}.

replace_entry(label: str, site: Union[int, str], entry: Dict, prefix=None)¶

Replace entry!! This would cover the old entry.

参数:

label (str) – label name.
site (int) – number small than self.dup, or “all”.
entry (dict) – entry data.
prefix (str) – prefix name for batch of data.

settle(sort=False) → Dict¶

Settle data and return a formed dict.

参数:: sort (bool) – sort the entry keys or not.
返回:: data_settled – new dict.
返回类型:: dict

settle_to_pd(sort=False) → DataFrame¶

Settle data and return a formed pd.Dataframe.

参数:: sort (bool) – sort the entry keys or not.
返回:: data_settled – new table.
返回类型:: pd.Dataframe

spilt(prefix_label_site='') → Tuple¶: Try to get prefix,label,site_number.

update(data: Dict)¶

Add dict data.

参数:: data (dict) – {entry_key: entry}.

update_entry(label: str, site: Union[int, str], entry: Dict, prefix=None)¶

Add dict data to entry.

参数:

label (str) – label name.
site (int) – number small than self.dup, or “all”.
entry (dict) – entry data.
prefix (str) – prefix name for batch of data.

update_entry_kv(label: str, site: Union[int, str], key: str, value: Any, prefix=None)¶

Add dict data to entry.

参数:

label (str) – label name.
site (int) – number small than self.dup, or “all”.
key (str) – name of property.
value (any) – value (float, int, str)
prefix (str) – prefix name for batch of data.

update_from_pd(df: Union[DataFrame, str])¶

Read table and update to data. The table must be the formed by self.settle_to_pd function.

if df is str, try: df = pd.read_csv(“df_name”, index_col=0).T

参数:: df ((pd.DataFrame,str)) –

featurebox.data.mp_access module¶

class featurebox.data.mp_access.MpAccess(api_key: str = 'Di28ZMunseR8vr56')¶

基类：object

API for pymatgen database, access pymatgen to get data.

示例

>>> mpa = MpAccess("Di28ZMunseR8vr57")
>>> ids = mpa.get_ids({"elements": {"$in": ["Al","O"]},'nelements': {"$lt": 2, "$gte": 1}})
number 29
>>> df = mpa.data_fetcher(mp_ids=ids, mp_props=['material_id', "cif"])
Will fetch 29 inorganic compounds from Materials Project
>>> structures_list = mpa.cifs_to_structures()
...

参数:: api_key (str:) – pymatgen key.

cifs_to_structures(cifs: Optional[List[str]] = None) → List[Structure]¶: Get structures from cifs

data_fetcher(mp_ids: Optional[List[str]] = None, mp_props: Optional[List[str]] = None, elasticity: bool = False) → DataFrame¶

Fetch file from pymatgen.

prop_name=[‘band_gap’,’density’,”icsd_ids“‘volume’,’material_id’,’pretty_formula’,’elements’,”energy”, ‘efermi’,’e_above_hull’,’formation_energy_per_atom’,’final_energy_per_atom’,’unit_cell_formula’, ‘spacegroup’,’nelements‘“nsites”,”final_structure”,”cif”,”piezo”,”diel”]

参数:

mp_ids (list of str) – list of MP id of pymatgen.
mp_props (list of str) – prop_names
elasticity (bool) – obtain elasticity or not.

返回:

properties Table.

返回类型:

pandas.DataFrame

get_ids(criteria: Optional[Dict] = None)¶

Search id by criteria.

support_property = [‘energy’, ‘energy_per_atom’, ‘volume’, ‘formation_energy_per_atom’, ‘nsites’, ‘unit_cell_formula’,’pretty_formula’, ‘is_hubbard’, ‘elements’, ‘nelements’, ‘e_above_hull’, ‘hubbards’, ‘is_compatible’, ‘spacegroup’, ‘task_ids’, ‘band_gap’, ‘density’, ‘icsd_id’, ‘icsd_ids’, ‘cif’, ‘total_magnetization’,’material_id’, ‘oxide_type’, ‘tags’, ‘elasticity’]

示例

>>> from itertools import combinations
>>> name_list = ["NaCl","CaCo3"]
>>> criteria = {
... 'pretty_formula': {"$in": name_list},
... 'nelements': {"$lt": 3, "$gte": 3},
... 'spacegroup.number': {"$in": [225]},
... 'crystal_system': "cubic",
... 'nsites': {"$lt": 20},
... 'formation_energy_per_atom': {"$lt": 0},
... "elements": {"$all": "O"},
... "piezo":{"$ne": None}
... "elements": {"$all": "O"},
... "elements": {"$in": list(combinations(["Al", "Co", "Cr", "Cu", "Fe", 'Ni'], 5))}}

where, "$gt" >, "$gte" >=, "$lt" <, "$lte" <=, "$ne" !=, "$in", "$nin" (not in), "$or", "$and", "$not", "$nor" , "$all"

get_ids_from_web_table(path_file: Optional[str] = None) → List[str]¶: This is a add method to read csv file download from web,the file name is ‘_Materials Project.csv’, which contains “Materials Id” columns.

featurebox.data.namesplit module¶

class featurebox.data.namesplit.NameSplit(bracket_follow: bool = False)¶

基类：object

Split the name to table, return expand.csv and folds.csv.

备注

Make sure the number below the element or (), For situation that the number of () before the (), such as “0.9(BaZr)+0.1(BaZrO3)”.

Please set the bracket_follow=True, and for this situation make sure the first number below ) have + !!!.

示例

>>> from featurebox.data.namesplit import NameSplit
>>> import os
>>> os.chdir(r'.')
>>> name = ['(Ti1.24La3)2',"((Ti1.24)2P2)1H0.2", "((Ti1.24)2)1H0.2",
... "((Ti1.24))1H0.2", "((Ti)2P2)1H0.2",  "((Ti))1H0.2"]
>>> NSp = NameSplit()
>>> NSp.transform(name)
...

transform(names: List[str], folds_name: str = 'folds.csv', expands_name: str = 'expands.csv')¶

参数:

names (list of str) – composition names list.
folds_name (str) – return name, default is ‘folds.csv’
expands_name (str) – return name, default is ‘expands.csv’

返回:

return tables.csv.

返回类型:

None