refine_ca performs a partial bootstrap correspondence analysis.

refine_date checks the stability of a DateModel object.

refine_diversity checks the stability of a DiversityIndex object.

refine_ca(object, ...)

refine_diversity(object, ...)

refine_event(object, ...)

# S4 method for CA
refine_ca(object, cutoff, n = 1000, axes = c(1, 2), ...)

# S4 method for DateModel
refine_event(
object,
method = c("jackknife", "bootstrap"),
level = 0.95,
probs = c(0.05, 0.95),
n = 1000,
...
)

# S4 method for HeterogeneityIndex
refine_diversity(
object,
method = c("jackknife", "bootstrap"),
probs = c(0.05, 0.95),
n = 1000,
...
)

# S4 method for EvennessIndex
refine_diversity(
object,
method = c("jackknife", "bootstrap"),
probs = c(0.05, 0.95),
n = 1000,
...
)

## Arguments

object A CA, DateModel or DiversityIndex object. Currently not used. A function that takes a numeric vector as argument and returns a single numeric value (see below). A non-negative integer giving the number of bootstrap replications (see below). A numeric vector giving the subscripts of the CA axes to be used (see below). A character string specifying the resampling method to be used. This must be one of "jackknife", "bootstrap" (see below). Any unambiguous substring can be given. A length-one numeric vector giving the confidence level. A numeric vector of probabilities with values in $$[0,1]$$ (see quantile).

## Value

refine_diversity and refine_event return a data.frame.

refine_ca returns a BootCA object.

## Note

Refining method can lead to much longer execution times and larger output objects. To monitor the execution of these re-sampling procedures, a progress bar will be displayed.

## Correspondence Analysis Refining

refine_ca allows to identify samples that are subject to sampling error or samples that have underlying structural relationships and might be influencing the ordering along the CA space.

This relies on a partial bootstrap approach to CA-based seriation where each sample is replicated n times. The maximum dimension length of the convex hull around the sample point cloud allows to remove samples for a given cutoff value.

According to Peebles and Schachner (2012), "[this] point removal procedure [results in] a reduced dataset where the position of individuals within the CA are highly stable and which produces an ordering consistent with the assumptions of frequency seriation."

If the results of refine is used as an input argument in seriate, a correspondence analysis is performed on the subset of object which matches the samples to be kept. Then excluded samples are projected onto the dimensions of the CA coordinate space using the row transition formulae. Finally, row coordinates onto the first dimension give the seriation order.

## Date Model Checking

If jackknife is used, one type/fabric is removed at a time and all statistics are recalculated. In this way, one can assess whether certain type/fabric has a substantial influence on the date estimate. A six columns data.frame is returned, giving the results of the resampling procedure (jackknifing fabrics) for each assemblage (in rows) with the following columns:

id

An identifier to link each row to an assemblage.

date

The jackknife event date estimate.

lower

The lower boundary of the associated prediction interval.

upper

The upper boundary of the associated prediction interval.

error

The standard error of predicted means.

bias

The jackknife estimate of bias.

If bootstrap is used, a large number of new bootstrap assemblages is created, with the same sample size, by resampling each of the original assemblage with replacement. Then, examination of the bootstrap statistics makes it possible to pinpoint assemblages that require further investigation. A five columns data.frame is returned, giving the bootstrap distribution statistics for each replicated assemblage (in rows) with the following columns:

min

Minimum value.

mean

Mean value (event date).

max

Maximum value.

Q5

Sample quantile to 0.05 probability.

Q95

Sample quantile to 0.95 probability.

## References

Bellanger, L., Tomassone, R. & Husi, P. (2008). A Statistical Approach for Dating Archaeological Contexts. Journal of Data Science, 6, 135-154.

Peeples, M. A., & Schachner, G. (2012). Refining correspondence analysis-based ceramic seriation of regional data sets. Journal of Archaeological Science, 39(8), 2818-2827. DOI: 10.1016/j.jas.2012.04.040.

Other statistics: independance, test_diversity(), test_fit()

N. Frerebeau

## Examples

## Data from Magurran 1988, p. 145-149
birds <- CountMatrix(
data = c(35, 26, 25, 21, 16, 11, 6, 5, 3, 3,
3, 3, 3, 2, 2, 2, 1, 1, 1, 1, 0, 0,
30, 30, 3, 65, 20, 11, 0, 4, 2, 14,
0, 3, 9, 0, 0, 5, 0, 0, 0, 0, 1, 1),
nrow = 2, byrow = TRUE, dimnames = list(c("oakwood", "spruce"), NULL))

## Shannon diversity
heterogeneity <- index_heterogeneity(birds, "shannon")
refine_diversity(heterogeneity, method = "bootstrap")#>         min      mean     max      Q5       Q95
#> oakwood 2.094642 2.35041  2.573477 2.23074  2.468232
#> spruce  1.793914 2.021042 2.209898 1.915112 2.127804refine_diversity(heterogeneity, method = "jackknife")#>         mean     bias       error
#> oakwood 2.362648 -0.9520239 0.1233832
#> spruce  2.012326 -0.9169561 0.2490786
## Shannon evenness
evenness <- index_evenness(birds, "shannon")
refine_diversity(evenness, method = "bootstrap")#>         min       mean      max       Q5        Q95
#> oakwood 0.7233923 0.8152063 0.8876801 0.7793649 0.8508275
#> spruce  0.7101253 0.7888973 0.8581119 0.748665  0.82948  refine_diversity(evenness, method = "jackknife")#>         mean      bias        error
#> oakwood 0.8011374 -0.05600727 0.03567978
#> spruce  0.776363  -0.05669112 0.07771685