geochemdb package¶

Module contents¶

class geochemdb.GeochemDB(database_path)[source]¶

Bases: object

assumes a SQLite database with the schema describe in the package documentation.

__init__(database_path)[source]¶

Initializes a GeochemDB instance.

Parameters:: (str) (database_path)

_database_path(str)¶

Type:: Internal storage for the database path.

con(sqlite3.Connection)¶

Type:: SQLite connection object.

cursor(sqlite3.Cursor)¶

Type:: SQLite cursor object.

get_aliquots()[source]¶

List aliquots in the database.

Parameters:: None.
Returns:: aliquots – Array of aliquot names in the database.
Return type:: array

get_aliquots_samples()[source]¶

List samples and aliquots in the database.

Parameters:: None.
Returns:: df – DataFrame with columns ‘sample’ and ‘aliquot’.
Return type:: pandas.DataFrame

get_samples()[source]¶

List samples in the database.

Parameters:: None.
Returns:: samples – Array of sample names in the database.
Return type:: array

insert_rows(table, columns, values)[source]¶

Insert rows into table.

Parameters:

table (str) – name of table in which to insert row.
columns (arraylike) – columns in table to insert new values for.
values (list) – must be a list of tuples

Return type:

None.

matchcolumns(table, df_cols, score_threshold=96.0)[source]¶

Match columns of df to columns in the sqlite database

Parameters:

table (str) – Table whose columns to match
df_cols (arraylike) – Columns to match to columns in sqlite database
score_threshold (float) – thefuzz score that matching must exceed to be a match

Returns:

col_match_dict – dictionary of matches where keys are df_cols and values are the sql columns for the matched table

Return type:

dictionary

matchrows(table, values, columns)[source]¶

exactly match rows in a table based on provided values

Parameters:

table (str) – name of the table to match rows into.
values (arraylike) – array of values to match.
columns (arraylike) – names of columns in table that contain values; must have same length as second dimension of values

Returns:

idx – logical indices of length len(names); true for each row in values matched in the table.

Return type:

array (bool)

matchrows_strings(table, names, column, score_threshold=98)[source]¶

match to rows in a table based on a column in the row using strings

Parameters:

table (str) – name of the table to match rows into.
names (arraylike) – list of names to match in the table
column (str) – name of column in table to do matching on
score_threshold (float) – thefuzz score that matching must exceed to be a match

Returns:

idx (array (bool)) – logical indices of length len(names); true for each entry matched in the table
sample_matches_dict (dict) – closest matching sample names in database with scores exceeding the threshold as values for keys being the provide matching sample names

matchsamples_df(df, score_threshold=96.0)[source]¶

Match samples in a DataFrame with a ‘sample’ column to existing samples in the database

Parameters:: df (pandas.DataFrame) – DataFrame with a ‘sample’ column
Returns:: df_matched – df with rows corresponding to matched samples
Return type:: pandas.DataFrame

measurements_add(df_measurements, df_analyses, df_aliquots, score_threshold=98)[source]¶

Add measurements for new analyses, but don’t add samples.

Parameters:

df_measurements (pandas.DataFrame) –
DataFrame suitable for reference against the Measurements table must have have the following columns:

analysis, quantity, mean, measurement_unit, uncertainty, uncertainty_unit
df_analyses (pandas.DataFrame) – DataFrame suitable for reference against the Analyses table. must have the following columns: analysis, aliquot, date, insturment, technique
df_aliquots (pandas.DataFrame) – DataFrame suitable for reference against the Aliquots table. must have the following columns: aliquot, sample, material
score_threshold (int) – 0-100, scoring threshold for matching sample names. defaults to 98

Return type:

None.

measurements_by_aliquot(aliquots)[source]¶

Return a DataFrame with all measurements corresponding to the requested aliquots.

Parameters:: aliquots (str or arraylike) – aliquot(s) for which to retrieve measurements
Returns:: df – All measurements associated with the aliquot(s).
Return type:: pandas.DataFrame

measurements_by_sample(samples)[source]¶

return a DataFrame with all measurements corresponding to the requested samples

Parameters:: samples (str or arraylike) – sample or samples for which to retrieve measurements
Returns:: df – all measurements associated with the sample.
Return type:: pandas.DataFrame

measurements_update(df_measurements)[source]¶

Update matching spot measurements in the Measurements table for matching analyses. Does not attempt to add aliquots, analyses, samples, or measurements

Parameters:

df (pandas.DataFrame) –

Ideally generated by iolite_tools.measurements2sql() must have minimally the following columns:

analysis, quantity, mean, measurement_unit, uncertainty, uncertainty_unit

optionally:: reference_material

Return type:

None.

update_rows(table, match_columns, match_values, update_columns, update_values)[source]¶

Update columns in rows in a table based on values in matching columns.

Parameters:

table (str) – Name of table to update rows in.
match_columns (arraylike) – Columns to do matching on.
match_values (list) – List of tuples of values to match rows on in match_columns. Length of each tuple must be same as len(match_columns)
update_columns (arraylike) – Columns for which to update values.
update_values (list) – List of tuples with values to update in update_columns. Length of each tuple must be same as len(update_columns)

Return type:

None.

geochemdb.aliquot_average(df_measurements)[source]¶

given a dataframe of measurements as generated by GeochemDB.measurements_by_sample() or GeochemDB.measurements_by_aliquots(), gather measurements by aliquot, averaging duplicate measurements. Assumes that duplicates have the same units.

to do:: implement more robust duplicate checking responsible uncertainty propagation

Parameters:: df_measurements (pd.DataFrame) – Dataframe of measurements output by GeochemDB.measurements_by_sample().
Returns:: DataFrame with geochemical measurements averaged by aliquot.
Return type:: pd.DataFrame

geochemdb package¶

Module contents¶

geochemdb

Navigation

Related Topics