geochemdb package¶
Module contents¶
- class geochemdb.GeochemDB(database_path)[source]¶
Bases:
object
assumes a SQLite database with the schema describe in the package documentation.
- __init__(database_path)[source]¶
Initializes a GeochemDB instance.
- Parameters:
(str) (database_path)
- _database_path(str)¶
- Type:
Internal storage for the database path.
- con(sqlite3.Connection)¶
- Type:
SQLite connection object.
- cursor(sqlite3.Cursor)¶
- Type:
SQLite cursor object.
- get_aliquots()[source]¶
List aliquots in the database.
- Parameters:
None.
- Returns:
aliquots – Array of aliquot names in the database.
- Return type:
array
- get_aliquots_samples()[source]¶
List samples and aliquots in the database.
- Parameters:
None.
- Returns:
df – DataFrame with columns ‘sample’ and ‘aliquot’.
- Return type:
- get_samples()[source]¶
List samples in the database.
- Parameters:
None.
- Returns:
samples – Array of sample names in the database.
- Return type:
array
- matchcolumns(table, df_cols, score_threshold=96.0)[source]¶
Match columns of df to columns in the sqlite database
- Parameters:
- Returns:
col_match_dict – dictionary of matches where keys are df_cols and values are the sql columns for the matched table
- Return type:
dictionary
- matchrows(table, values, columns)[source]¶
exactly match rows in a table based on provided values
- Parameters:
table (str) – name of the table to match rows into.
values (arraylike) – array of values to match.
columns (arraylike) – names of columns in table that contain values; must have same length as second dimension of values
- Returns:
idx – logical indices of length len(names); true for each row in values matched in the table.
- Return type:
array (bool)
- matchrows_strings(table, names, column, score_threshold=98)[source]¶
match to rows in a table based on a column in the row using strings
- Parameters:
- Returns:
idx (array (bool)) – logical indices of length len(names); true for each entry matched in the table
sample_matches_dict (dict) – closest matching sample names in database with scores exceeding the threshold as values for keys being the provide matching sample names
- matchsamples_df(df, score_threshold=96.0)[source]¶
Match samples in a DataFrame with a ‘sample’ column to existing samples in the database
- Parameters:
df (pandas.DataFrame) – DataFrame with a ‘sample’ column
- Returns:
df_matched – df with rows corresponding to matched samples
- Return type:
- measurements_add(df_measurements, df_analyses, df_aliquots, score_threshold=98)[source]¶
Add measurements for new analyses, but don’t add samples.
- Parameters:
df_measurements (pandas.DataFrame) –
DataFrame suitable for reference against the Measurements table must have have the following columns:
analysis, quantity, mean, measurement_unit, uncertainty, uncertainty_unit
df_analyses (pandas.DataFrame) – DataFrame suitable for reference against the Analyses table. must have the following columns: analysis, aliquot, date, insturment, technique
df_aliquots (pandas.DataFrame) – DataFrame suitable for reference against the Aliquots table. must have the following columns: aliquot, sample, material
score_threshold (int) – 0-100, scoring threshold for matching sample names. defaults to 98
- Return type:
None.
- measurements_by_aliquot(aliquots)[source]¶
Return a DataFrame with all measurements corresponding to the requested aliquots.
- Parameters:
aliquots (str or arraylike) – aliquot(s) for which to retrieve measurements
- Returns:
df – All measurements associated with the aliquot(s).
- Return type:
- measurements_by_sample(samples)[source]¶
return a DataFrame with all measurements corresponding to the requested samples
- Parameters:
samples (str or arraylike) – sample or samples for which to retrieve measurements
- Returns:
df – all measurements associated with the sample.
- Return type:
- measurements_update(df_measurements)[source]¶
Update matching spot measurements in the Measurements table for matching analyses. Does not attempt to add aliquots, analyses, samples, or measurements
- Parameters:
df (pandas.DataFrame) –
Ideally generated by iolite_tools.measurements2sql() must have minimally the following columns:
analysis, quantity, mean, measurement_unit, uncertainty, uncertainty_unit
- optionally:
reference_material
- Return type:
None.
- update_rows(table, match_columns, match_values, update_columns, update_values)[source]¶
Update columns in rows in a table based on values in matching columns.
- Parameters:
table (str) – Name of table to update rows in.
match_columns (arraylike) – Columns to do matching on.
match_values (list) – List of tuples of values to match rows on in match_columns. Length of each tuple must be same as len(match_columns)
update_columns (arraylike) – Columns for which to update values.
update_values (list) – List of tuples with values to update in update_columns. Length of each tuple must be same as len(update_columns)
- Return type:
None.
- geochemdb.aliquot_average(df_measurements)[source]¶
given a dataframe of measurements as generated by
GeochemDB.measurements_by_sample()
orGeochemDB.measurements_by_aliquots()
, gather measurements by aliquot, averaging duplicate measurements. Assumes that duplicates have the same units.- to do:
implement more robust duplicate checking responsible uncertainty propagation
- Parameters:
df_measurements (pd.DataFrame) – Dataframe of measurements output by
GeochemDB.measurements_by_sample()
.- Returns:
DataFrame with geochemical measurements averaged by aliquot.
- Return type:
pd.DataFrame