geochemdb
guide¶
import geochemdb
This notebook provides example utilization of the module with some example zircon laser ablation data reduced by Iolite 4. You will also need to have iolite_tools to facilitate processing the example dataset into dataframes that are ready for geochemdb
.
This notebook assumes that you are working from the preconfigured example database geochemdb_example.db
, which is in the example_data directory.
For use with your own data, you will also need to have properly configured a database. The schema.sql is available in the repository. For your own database, you can use the schema to create an empty database file with the correct tables and relationships by running
sqlite3 your-geochem-database.db < schema.sql
at a command line after having installed sqlite3 (which can be done via anaconda).
Alternatively, you can copy geochemdb_example.db
and edit it using a program like the free SQLiteStudio.
import iolite_tools
Create GeochemDB
Object¶
The core class for the geochemdb
module is GeochemDB
, which generates an object that allows you to interface with a geochemical SQLite database.
The example_data
directory can be found at the root of the repository on Github.
database_path = 'example_data/geochemdb_example.db'
db = geochemdb.GeochemDB(database_path)
This object allows you to
add
update
retrieve
measurements from the database.
This guide walks through each of these steps.
Adding Measurements¶
GeochemDB.measurements_add()
adds new measurements to the Measurements table of the database.
Note: GeochemDB.measurements_add()
does not add missing samples to the Samples
table. Samples should be added to the database manually with all of their corresponding metadata.
To add measurements to the database, you have to provide three dataframes:
measurements : This dataframe contains geochemical measurements as rows, alongside analysis names. Measurements are the values of geochemical concentrations or ratios, along with their uncertainties.
analyses : This dataframe lists analyses and metadata. An analysis refers to measurement of an aliquot by a machine. All analyses referenced in measurements should be present here. No other analyses should be present.
aliquots : This dataframe contains aliquots to which analyses belong. An aliquot is a portion of a sample. All aliquots referenced in the analsyes dataframe should be present here. No other aliquots should be present.
In the example here, measurements are trace element, U, and Pb concentrations in zircons via laser ablation. These measurements were gathered via the split-stream configuration at the UCSB petrochronology lab. This procedure takes the aerosol generated via laser ablation and splits it, delivering portions to different mass spectrometers (one to measure trace elements, another U and Pb).
In this case, an aliquot refers to the aerosol generated by the laser, which is designated by a spot number (e.g., S_z48). Each aliquot has two analyses: one for trace elements and another for U/Pb. Each analysis has as many measurements as the mass spectrometers were configured to collect.
The code below prepares the necessaary dataframes using example LASS-ICPMS datasets. I’ve developed iolite_tools
to facilitate processing these reduced data from excel spreadsheets.
df = iolite_tools.excel2measurements('example_data/2023-03_run-5_trace.xlsx',
'2023-03', 5, 'trace')
df_measurements = iolite_tools.measurements2sql(df, refmat='91500')
df_analyses = iolite_tools.analyses2sql(df, date='2023-03-17',
instrument='Nu Plasma 3D',
technique='LASS ICPMS')
df_aliquots = iolite_tools.aliquots2sql(df, material='zircon')
While iolite_tools
is specifically designed for this particular type of data, the structure of the dataframes above is what is necessary for adding data to the database. As long as you can produce these dataframes from your own reduced datasets, then you will be able to add your own data.
Note
You will have to make sure that the auxiliary tables have been appropriately populated with the relevant values before trying to add measurements. These tables include MeasurementUnits, Quantities, QuantitiesMeasurementUnits, Instruments, Techniques, and Materials. See readme for more information. You will notice that the columns in the dataframes generated above reference these types of information, which is all about facilitating the metadata collection and standardization that is often lacking. The geochem_example.db
database already has the values added, but you may have to add your own depending on your data collection and reduction practices.
Now that we have made the measurements, analyses, and aliquots dataframes, we can use db.measurements_add()
to add them to the database.
Prior to running this function, however, you can check to see what samples will be matched based on the strings in the dataframes. The code uses fuzzy string matching on sample names, so it can be convenient to run db.matchrows_strings()
to ensure that samples are being matched appropriately.
db.matchrows_strings('Samples', df_aliquots['sample'].unique(), 'name', score_threshold=95)
(array([False, False, True, False, False, False]),
{'AT2310 16 4': 'AT2310 16.4'})
We can see that only sample AT2310 16.4 is matched since it’s the only one in the database. If samples do not match, then they either are not present in the database and need to be added, or the sample names in the dataframes differ too much from the sample names in the database. The degree of string similarity for the fuzzy name matching is controlled by the score_threshold
parameter, which can be decreased to match more dissimilar strings. In this case, a threshold of 95 is sufficient to match ‘AT2310 16 4’ to ‘AT2310 16.4’.
Once you’re satisfied that the dataframes are ready and sample names will match, you can add the measurements to the database.
db.measurements_add(df_measurements, df_analyses, df_aliquots)
Sample names not matched:
{'91500', 'Ples', 'GJ1', 'AT22 FRAN 17 29', 'NIST612'}
Added:
122 aliquots,
122 analyses,
3416 measurements
Updating Measurements¶
You might find that you’ve improved your data reduction at some point in the future, and you have new values for measurements from existing analyses. If you have already added these measurements to the database, you can update them using db.measurements_update()
. Note that analysis names must be identical to the existing analyses in the database.
db.measurements_update(df_measurements)
Updated:
3416 measurements
Get Measurements by Sample¶
Retrieving measurements from the database can be done with db.measurements_by_sample()
.
df_measurements = db.measurements_by_sample('AT2310 16.4')
This function returns a dataframe with all measurements as rows, whereas most end-users will likely want measured quantities in columns. The geochemdb.aliquot_average()
function generates such a dataframe by averaging aliquots with repeat measurements of any quantities.
df_aliquots = geochemdb.aliquot_average(df_measurements)