geochemdb package

Module contents

class geochemdb.GeochemDB(database_path)[source]

Bases: object

assumes a SQLite database with the schema describe in the package documentation.

__init__(database_path)[source]

Initializes a GeochemDB instance.

Parameters:

(str) (database_path)

_database_path(str)
Type:

Internal storage for the database path.

con(sqlite3.Connection)
Type:

SQLite connection object.

cursor(sqlite3.Cursor)
Type:

SQLite cursor object.

get_aliquots()[source]

List aliquots in the database.

Parameters:

None.

Returns:

aliquots – Array of aliquot names in the database.

Return type:

array

get_aliquots_samples()[source]

List samples and aliquots in the database.

Parameters:

None.

Returns:

df – DataFrame with columns ‘sample’ and ‘aliquot’.

Return type:

pandas.DataFrame

get_samples()[source]

List samples in the database.

Parameters:

None.

Returns:

samples – Array of sample names in the database.

Return type:

array

insert_rows(table, columns, values)[source]

Insert rows into table.

Parameters:
  • table (str) – name of table in which to insert row.

  • columns (arraylike) – columns in table to insert new values for.

  • values (list) – must be a list of tuples

Return type:

None.

matchcolumns(table, df_cols, score_threshold=96.0)[source]

Match columns of df to columns in the sqlite database

Parameters:
  • table (str) – Table whose columns to match

  • df_cols (arraylike) – Columns to match to columns in sqlite database

  • score_threshold (float) – thefuzz score that matching must exceed to be a match

Returns:

col_match_dict – dictionary of matches where keys are df_cols and values are the sql columns for the matched table

Return type:

dictionary

matchrows(table, values, columns)[source]

exactly match rows in a table based on provided values

Parameters:
  • table (str) – name of the table to match rows into.

  • values (arraylike) – array of values to match.

  • columns (arraylike) – names of columns in table that contain values; must have same length as second dimension of values

Returns:

idx – logical indices of length len(names); true for each row in values matched in the table.

Return type:

array (bool)

matchrows_strings(table, names, column, score_threshold=98)[source]

match to rows in a table based on a column in the row using strings

Parameters:
  • table (str) – name of the table to match rows into.

  • names (arraylike) – list of names to match in the table

  • column (str) – name of column in table to do matching on

  • score_threshold (float) – thefuzz score that matching must exceed to be a match

Returns:

  • idx (array (bool)) – logical indices of length len(names); true for each entry matched in the table

  • sample_matches_dict (dict) – closest matching sample names in database with scores exceeding the threshold as values for keys being the provide matching sample names

matchsamples_df(df, score_threshold=96.0)[source]

Match samples in a DataFrame with a ‘sample’ column to existing samples in the database

Parameters:

df (pandas.DataFrame) – DataFrame with a ‘sample’ column

Returns:

df_matched – df with rows corresponding to matched samples

Return type:

pandas.DataFrame

measurements_add(df_measurements, df_analyses, df_aliquots, score_threshold=98)[source]

Add measurements for new analyses, but don’t add samples.

Parameters:
  • df_measurements (pandas.DataFrame) –

    DataFrame suitable for reference against the Measurements table must have have the following columns:

    analysis, quantity, mean, measurement_unit, uncertainty, uncertainty_unit

  • df_analyses (pandas.DataFrame) – DataFrame suitable for reference against the Analyses table. must have the following columns: analysis, aliquot, date, insturment, technique

  • df_aliquots (pandas.DataFrame) – DataFrame suitable for reference against the Aliquots table. must have the following columns: aliquot, sample, material

  • score_threshold (int) – 0-100, scoring threshold for matching sample names. defaults to 98

Return type:

None.

measurements_by_aliquot(aliquots)[source]

Return a DataFrame with all measurements corresponding to the requested aliquots.

Parameters:

aliquots (str or arraylike) – aliquot(s) for which to retrieve measurements

Returns:

df – All measurements associated with the aliquot(s).

Return type:

pandas.DataFrame

measurements_by_sample(samples)[source]

return a DataFrame with all measurements corresponding to the requested samples

Parameters:

samples (str or arraylike) – sample or samples for which to retrieve measurements

Returns:

df – all measurements associated with the sample.

Return type:

pandas.DataFrame

measurements_update(df_measurements)[source]

Update matching spot measurements in the Measurements table for matching analyses. Does not attempt to add aliquots, analyses, samples, or measurements

Parameters:

df (pandas.DataFrame) –

Ideally generated by iolite_tools.measurements2sql() must have minimally the following columns:

analysis, quantity, mean, measurement_unit, uncertainty, uncertainty_unit

optionally:

reference_material

Return type:

None.

update_rows(table, match_columns, match_values, update_columns, update_values)[source]

Update columns in rows in a table based on values in matching columns.

Parameters:
  • table (str) – Name of table to update rows in.

  • match_columns (arraylike) – Columns to do matching on.

  • match_values (list) – List of tuples of values to match rows on in match_columns. Length of each tuple must be same as len(match_columns)

  • update_columns (arraylike) – Columns for which to update values.

  • update_values (list) – List of tuples with values to update in update_columns. Length of each tuple must be same as len(update_columns)

Return type:

None.

geochemdb.aliquot_average(df_measurements)[source]

given a dataframe of measurements as generated by GeochemDB.measurements_by_sample() or GeochemDB.measurements_by_aliquots(), gather measurements by aliquot, averaging duplicate measurements. Assumes that duplicates have the same units.

to do:

implement more robust duplicate checking responsible uncertainty propagation

Parameters:

df_measurements (pd.DataFrame) – Dataframe of measurements output by GeochemDB.measurements_by_sample().

Returns:

DataFrame with geochemical measurements averaged by aliquot.

Return type:

pd.DataFrame