# Cosmogenic Exposure Age Averages

We often need to calculate averages of a group of data with associated uncertainties (e.g. a group of cosmogenic ages from erratics on the same moraine). The individual ages are usually in the form of AGE±UNCERTAINTY. This represents a most likely AGE and the age range that corresponds to “one sigma”, meaning that there is a 68% probability of our “true age” to be between AGEUNCERTAINTY and AGE+UNCERTAINTY.

This way of expressing data is easy to read, allowing the reader to get a clear idea of the age and the precision using only two figures. Also, for most cosmogenic ages, this form represents the accurate distribution of the data, which is required when these ages are later used to calculate something else.

That is why we tend to represent the average of several ages using the from AGE±UNCERTAINTY. However, there are several ways of calculating this average, and the accuracy of the calculated average in representing the original dataset depends on both a) the characteristics of our dataset, and b) the method used to calculate the average.

##### Surface exposure ages from cosmonuclide concentrations

Online calculators are typically used to calculate surface exposure ages from the concentration of cosmogenic nuclides in surface samples. Some of the most used online calculators are:

All these calculators require inputting data about the sampling site, the characteristics of the reference material used in the concentration measurements, and the measured concentrations with their uncertainties. The measurement uncertainty should include the laboratory and analytical uncertainties, usually including the scatter of the spectrometer measurements, weighing uncertainties, and the nominal error of the concentration of the carrier used, if any. You can find more information on how to calculate concentration uncertainties here:

The concentration uncertainty is directly transmitted to the internal error of the final age.

The online calculators always report at least 3 data per sample: the apparent surface exposure age and two uncertainties: internal and external uncertainties. Internal uncertainty includes only the transmission of the uncertainty of the concentrations, and this is the figure we should use when comparing ages from samples collected nearby and prepared in the same way, usually our whole dataset. The external uncertainty contains the internal uncertainty and the uncertainty of the method used to calculate the age, typically the uncertainties of the scaling method and the uncertainty of the reference production rate, i.e. the scatter of the calibration data. The external uncertainty is the figure we should use when comparing our data with ages from other sites or ages calculated using other methods.

The Calibration and Scaling Uncertainty is usually a fixed percentage of our ages. You can easily check this by subtracting both errors in quadrature (σext. and σint.) and divide the result by the age (μ):

$\displaystyle CSU=\frac{\sqrt{\sigma_{ext.}^2-\sigma_{int.}^2}}{\mu}$

The result is usually a constant percentage for all your ages. E.g. the calibration uncertainty of the Be-10 ages calculated using the online-calculators-formerly-known-as-the-CRONUS-Earth-online-calculators v.3 using the LSDn scaling scheme is typically 8.2%.

As we can always calculate the external error by adding in quadrature the CSU to our internal uncertainties, we can just forget about the external uncertainties when operating with our ages (e.g. calculating averages) and add the CSU at the end of our calculations.

##### Types of surface exposure age datasets

The preparation and measurement of samples for cosmogenic exposure dating is time-consuming and expensive. Therefore, the samples that are finally measured are thoroughly selected and the final datasets contain a small number of samples, typically 4-6 per geologic landform.

Despite all care put in the sample selection, several natural processes make the apparent surface exposure ages to move from the true landform age. This natural noise could be caused by previous exposure of the sampled surfaces or non-constant exposure since the landform formation (e.g. boulder rotation). Therefore, we should expect outliers in our dataset or at least some scatter of our ages.

Also, the inhomogeneities during the sample preparation and analysis (e.g. different sample sizes, different AMS current, etc.) sometimes yield datasets with mixed precisions, even from identical geological samples.

All this makes typical surface exposure age datasets 1) small in terms of the number of data, 2) scattered due to the natural noise, and 3) often containing data with a mix of precise and imprecise data. Here we can see 4 synthetic examples of typical surface exposure age datasets with data obtained from samples from erratics on a LGM moraine (~18 ka):

When calculating the average of one of these datasets, we normally use the average or the weighted average.

###### Average and standard deviation

The simplest way of averaging ages is using the arithmetic mean, which is the sum of the ages $\mu_{i}$ divided by the number of ages $n$:

$\displaystyle AV=\frac{1}{n} \cdot \sum_{i=1}^{n}{\mu_{i}}$

The uncertainty associated with the average is the standard deviation, which is typically calculated as:

$\displaystyle SD=\sqrt{\frac{1}{n-1} \cdot \sum_{i=1}^{n}{(\mu_{i}-AV)}}$

If we apply these formulas to the previous datasets, we obtain the following AV±SD:

In a good set of ages, as in the first case (the precise dataset), the error bars of the individual ages correspond to the scatter of the dataset. Excluding the first case, the main problem when using this approach is that we are ignoring the uncertainties of the individual ages. This is not a big problem when the individual uncertainties are negligible compared to the scatter of the data, as in the second example (the scattered dataset). However, the presence of outliers pulls the average toward them in the third example (a dataset with outliers). The same happens in the last example (mixed precisions dataset), even when all individual age ranges overlap in the age of 18ka, the average yields 19ka.

###### Weighted average

The weighted average, of weighted arithmetic mean, is a way of calculating the average increasing the importance of the individual ages that are known more precisely. That means that the ages with smaller error bars will contribute more to the average than the ages with bigger uncertainties. This is typically calculated as:

$\displaystyle WA=\frac{\sum_{i=1}^{n}{\mu_{i}/\sigma_{i}^{2}}}{\sum_{i=1}^{n}{1/\sigma_{i}^{2}}}$

The uncertainty of the weighted average is sometimes calculated as the standard error of the weighted mean using this formula:

$\displaystyle \sqrt{\frac{1}{\sum_{i=1}^{n}{1/\sigma_{i}^{2}}}}$

which is a good representation of the effect of all analytical uncertainties if the individual ages on the weighted average. However, this method ignores the scatter of the data, which is usually bigger than the individual uncertainties. To take into account both sources of uncertainty in the weighted average, we should use the square root of the weighted sample variance to calculate this uncertainty. Thus, the Deviation of our Weighted Average will be:

$\displaystyle DWA=\sqrt{\frac{\sum_{i=1}^{n}{(\mu_{i}-WA)^{2}/\sigma_{i}^{2}}}{\sum_{i=1}^{n}{1/\sigma_{i}^{2}}}}$

If we apply these formulas to the previous datasets, we obtain the following WA±DWA:

The weighted average does not solve the problem with the outliers pulling the average towards younger ages in the third example. Actually, in the first 3 examples, this method produces a very similar result to the simple arithmetic mean. However, the weighted average is successful ignoring the effect of the imprecise data in the last example. A weighted average is a good option for filtering poor analytical data without discarding it.

##### Filtering data

When we look to a set of ages and error bars, we can intuitively guess which is the right age of the unit we are trying to date.

The weighted average will match our guess in the first, second, and fourth examples. However, to get rid of the effect of the obvious outliers in the third case we might need to discard data.

###### Outliers

We can just remove the odd ages manually. This might seem obvious looking at the third example, but it is less evident if we look at the second one. In the second example, and many real datasets, removing outliers manually is arbitrary and makes it difficult to compare averages of the different datasets that were manually trimmed. The election of outliers and the number of data we discard manually is often driven by the age we were primarily expecting and by our hope of getting a final age with a small error bar. Too human.

There are many automatic mathematical methods to discard outliers. We could stick to one method, discard the ages that seem not to fit, and apply the average, or the weighted average, to the mutilated dataset. However, this also brings some problems:

• The surface exposure datasets usually contain 4-6 ages. Most methods for removing outliers are designed to be used in groups with much more data. It is very difficult to justify statistically the removal of 2 data form a group of 6 ages.
• Usually, the geological samples have been chosen carefully to avoid samples that are not optimum for the method, especially when using expensive and time-consuming dating methods, such as cosmogenic surface exposure dating. When rejecting outliers, the scientist should provide a geological interpretation of the outlier. And this interpretation usually involves rejuvenating or ageing processes, that have been systematically avoided during the sample selection.

Is there a method that is not manual, and allows us calculating a realistic average of our ages without discarding outliers?

##### The Best Gaussian Fit

A good candidate to automatically get an average of our data that is similar to our intuitive age is the Best Gaussian Fit (BGF).

We can calculate the probability $P$ corresponding to each time $t$ for each age $\mu \pm \sigma$ assuming that it normally distributed:

$\displaystyle P(t) = \frac{1}{{\sigma \sqrt {2\pi } }} \cdot e^{-(\frac{t-\mu}{4 \cdot \sigma})^{2}}$

Then we can sum up all the probability distributions and find the Gaussian curve that fits better the resulting camelplot.

The $\mu$ and $\sigma$ values corresponding to the best fitting curve is our BGF:

As we can see, this method mimics quite well our intuitive age interpretation using a mathematical algorithm that takes into account all ages and their uncertainties. However, the process of finding our BGF requires goal fitting methods that are slightly beyond the capacity of our favourite calculator, Microsoft Excel®.

I did all these graphs using Octave. Below you can find a link to my GitHub repository with the code needed to perform all these calculations and plots at once (Average, Weighted Average, and Best Gaussian Fit with internal and external uncertainties). This code works well in both MATLAB® and Octave.

Additionally, I tried to make an XLSX file that calculates the same, but with no plots. However, the BGF calculations are based on a set of 1000 random curves, and this is often not enough to get the best fit. If you download this version, remember that the BGF approach might not be accurate!

For bigger datasets, such as calibrated Schmidt Hammer ages, it can be interesting fitting our cameplot to multiple Gaussian curves if we suspect that our ages reflect more than one unique events. Jason Dortch developed P-CAAT for fitting several Gaussian curves to big datasets.

#### Cosmogenic Exposure Age Averages (CEAA) calculators:

###### MATLAB/Octave program:

• Save your ages in a CSV file (comma separated values). You can save the file straight from Excel, LibreOffice, etc., or populate it using a text editor: just separate the numbers using commas.
• Screenshot at 2020-12-08 10-50-26Download the CEAA code from my GitHub repository https://github.com/angelrodes/CEAA
• Unzip the file CEAA-master.zip. It includes a folder with the examples shown here.
• Run the script start.m using Octave or MATLAB.
• A dialogue box will ask you to select your CSV file.
• You will get an output like this:

The CEAA.xlsx spreadsheet performs the same calculations as the scripts above, except for the BGF, which is approximated based on 1000 random curves. Also, this spreadsheet does not output any plot.