ABSTRACT: The high-resolution estimates of temporal mixing in shell beds: the evils and virtues of time-averaging.

Kowalewski, M., Goodfriend, G.A., and Flessa, K.W.

Manuscript - submitted to Paleobiology, August 27, 1997

Abstract. ---- This study explores time-averaging (temporal mixing) at very high sampling resolution: that of adjacent shells collected from the same stratum. Nine samples of the bivalve Chione fluctifraga were collected from four Holocene cheniers (beach ridges) on the Colorado Delta (Gulf of California) and 165 shells were dated using radiocarbon-calibrated amino-acid racemization (D-alloisoleucine/L-isoleucine). The age range of shells within samples averages 661 years and, in seven out of nine samples, exceeds 500 years. The sample standard deviation ranges from 73 to 294 years and averages 203 years. Thus, even within-sample estimates of time-averaging indicate extensive temporal mixing in bioclastic deposits. No matter how carefully collected, data from shell beds may not be suitable for studying processes on time-scales shorter than hundreds to thousands of years. Comparison of our data with the estimates obtained from other cheniers at coarser sampling resolutions, indicates that pooling of samples drastically increases time-averaging in paleontological data. Time-averaging is homogeneous among strata within cheniers, but varies among cheniers. Thus, deposits of seemingly identical origin may vary in their temporal resolution -- apparently comparable shell beds may differ in paleontological patterns (e.g., species diversity) due to cryptic variation in time-averaging. Age-distributions of dated shells indicate that, at 50 year resolution, the samples provide a continuous and uniform record for the entire interval. The incompleteness observed in the samples can easily be simulated by sampling a 100%-complete, uniform record. The mean sample completeness of the actual samples (63.6%) is very close to that predicted by the simulations (67.3%). Shell beds can record the optimal type of time-averaging, where paleobiological data are a time-weighted average of the faunal composition from the spectrum of environments that existed during the entire interval of time. Coordinated stasis may reflect a long-term averaging of taxa from a similar spectrum of environments, and not necessarily ecological locking. Also, within the range of radiocarbon dating, shell beds can provide a 100%-complete, high resolution record.

Michal Kowalewski. Institute of Paleobiology, Polish Academy of Sciences, Twarda 51/55, 00-818, Warszawa, Poland. E-mail: michael.kowalewski@uni-tuebingen.de
Glenn A. Goodfriend. Geophysical Laboratory, Carnegie Institution of Washington, 5251 Broad Branch Rd., NW., Washington DC 20015
Karl W. Flessa. Department of Geosciences, University of Arizona, Tucson AZ 85721


Shell beds and shell-rich deposits are one of the primary sources of paleontological data in the Phanerozoic fossil record. These deposits, however, typically undergo extensive temporal mixing (time-averaging) during their formation (e.g., Walker and Bambach 1971; Peterson 1977; Staff et al. 1986; Wilson 1988; Fürsich and Aberhan 1990; Kidwell and Bosence 1991; Kidwell and Behrensmeyer 1993; Kidwell and Flessa 1995; Kowalewski 1996a). In the last decade, high-resolution dating methods have been used to quantitatively estimate the extent of time-averaging. These intensive studies have shown that temporal mixing on the scale of hundreds to tens of thousands of years is the rule rather than the exception in marine, lacustrine, and terrestrial deposits (Behrensmeyer 1982; Goodfriend 1987; Cohen 1989; Powell and Davis 1990; Flessa et al. 1993; Goodfriend and Mitterer 1993; Flessa and Kowalewski 1994; Wehmiller et al. 1995; Goodfriend and Gould 1996; Martin et al. 1996; Anderson et al. 1997; Meldahl et al. 1997).
However, due to the high cost of dating, most studies have been based on a small number of dates, while the few larger datasets had low sampling resolution, being either literature compilations (e.g., Flessa and Kowalewski 1994) or based on data collected over larger areas (e.g., Meldahl et al. 1997). Thus, our understanding of time-averaging is itself biased by averaging caused by pooling dates from different samples, sites, and environments (‘analytical time-averaging’ [see Fürsich and Aberhan 1990; Behrensmeyer and Hook 1992]). Consequently, we still lack information about time-averaging at the highest and most fundamental resolution: that of a collection of fossils from a single sample of a minimal stratigraphic span (i.e., confined to the smallest indivisible stratigraphic unit that can be distinguished in the outcrop). How much time-averaging is there likely to be within the bag or slab of fossils the paleontologist brings back from the field and uses as the finest sample unit in subsequent analyses?
We used extensive radiocarbon-calibrated amino-acid racemization dating of mollusk shells from Holocene shelly deposits, to assemble a large dataset that offers statistically meaningful insights into the scale, variation, and internal structure of time-averaging at the highest sampling resolution -- that of adjacent shells from within the same stratum.

Material and Dating Technique

Study area and sampling. ---- We studied temporal resolution in the bioclastic beach ridges, or ”cheniers”, from the tidal flats of the Colorado River Delta (Fig. 1). These cheniers are lag-concentrations (sensu Kidwell 1991) formed through the reworking of the intertidal mudflat during episodes of low sediment input from the Colorado River (Thompson 1968; Kowalewski et al. 1994). The most recent episode -- caused by diversion of the river for irrigation, power, and flood control -- began 100 years ago (Fradkin 1984). As a result, cheniers have been forming in the upper intertidal zone. Older cheniers, situated landward in the supratidal flats, correspond to previous episodes of mudflat reworking caused by natural diversions of the river to the Salton trough (Thompson 1968; Kowalewski et al. 1994; Goodfriend et al. 1995).
This study focuses on a single species, the venerid bivalve Chione fluctifraga. This is because we have already developed a reliable, efficient, and inexpensive dating technique specifically for dating the shells of this species (see below for details) and because this is one of the two most common mollusks found in the cheniers (Kowalewski et al. 1994).
Data were obtained from four cheniers situated in the central part of the lower delta (Fig. 1B): Chenier 1 is situated in the upper intertidal, and Cheniers 2, 3, and 4 are increasingly older ridges, partly buried within the supratidal muds (Fig. 1C). Nine samples were collected at various depths from five trenches excavated in the four cheniers (Fig. 1C) and 165 complete valves of C. fluctifraga were dated using amino acid ratios. Except for sample 3-150 with 7 individuals, all samples included from 18 to 21 dated valves (Table 1).
Each sample was collected from well-exposed trench walls by hand-picking C. fluctifraga valves (articulated bivalve shells are very rarely found in the cheniers [Kowalewski et al. 1994]). We collected directly adjacent valves laterally, that is parallel to the sedimentary layering. In those cases when neither stratification was visible nor depositional dip indicated, we collected specimens from the same depth in the trench. This collection method minimizes the stratigraphic and spatial span of a sample. Out sampling technique is likely to be the finest-resolution stratigraphic sampling available to macroinvertebrate paleontologists.
Dating technique. ---- Each valve was analyzed for its A/I (alloisolucine/isoleucine) ratio and that value was used to estimate its age using a calibration equation based on radiocarbon ages (Fig. 2) (see Goodfriend 1989 for a discussion of this method). A/I values were determined by HPLC (high-performance liquid chromatography), using peak area ratios calibrated against a standard A/I mixture (see Goodfriend et al. 1997 for details of procedures). Samples were taken from the hinge area of each shell to avoid intrashell variation in racemization (Goodfriend et al. 1997). Shells collected from the surface were excluded from this analysis because of possible differences in racemization rate due to surface heating. No differences in rates were found in between samples buried at various depths from 20 to 150 cm (Goodfriend et al. 1995).
The age of shell (in years) is estimated from the equation: Age = (A/I-0.008)*10741, where 0.008 is the mean A/I value measured in live-collected, pre-bomb Chione shells (based on eight analyses), and 10741 is the 14C-calibrated racemization rate (Fig. 2). This rate is the slope derived from simple linear regression of 14C ages against A/I values for 15 individual Chione shells from Chenier 3, plus a point representing living Chione (based on their mean apparent 14C age [Goodfriend and Flessa submitted] and their mean A/I value). The measurement error for A/I values is ≤4% and thus can be ignored as a significant source of age variation among shells (i.e., the apparent time-averaging caused by measurement error is a few tens of years, at most). Of 165 dated shells, 98.8 % came from the last 1,500 years, and only two were significantly older (3,416 and 7,379 years). The age estimates for these two outliers are uncertain because the radiocarbon calibration was based on extrapolation of the racemization rate determined from much younger shells (Fig. 2). The two shells were excluded from the analysis; this makes our estimates of time-averaging more conservative. The raw data including A/I values and the corresponding age estimates are listed in the Appendix.

Analytical Methods and Results

Scale of time-averaging. ---- Statistically, time-averaging is the dispersion of an age-distribution, and thus, can be best estimated using dispersion measures. Previous workers used the age range between the youngest and oldest shell to estimate time-averaging (e.g., Flessa et al. 1993, Flessa and Kowalewski 1994; Meldahl et al. 1997). However, the range is an estimate that is very sensitive to sample size, is based on extreme outliers, and difficult to handle statistically. Time-averaging has also been estimated using shell half-life: the amount of time needed to remove 50% of shells present initially (Cummins et al. 1986; Meldahl et al. 1997). However, this measure, based on a best-fit exponential curve for the age-frequency distribution, assumes a continuous input of shells through time (Meldahl et al. 1997) and, more importantly, is sensitive to the resolution (binning) at which the data are analyzed.
We report the age range, to enable comparison with previous studies, but focus our analysis on the standard deviation (SD), a measure of dispersion which largely avoids the problems of the range or half-life approach and can be interpreted literally as the average departure of a shell’s age from the mean shell age. Note that it is appropriate to use SD, and not the coefficient of variation (CV), because the dispersion (time-averaging) is independent from the mean (average shell age): time-averaging is not a function of the age of the deposit (except, perhaps, at the macroevolutionary time-scale [Kidwell and Brenchley 1996; Kowalewski 1996a]). Thus, an increase in the stratigraphic age should be viewed as an additive transformation which shifts a distribution toward higher values but does not affect its dispersion parameters (as do the shifts of the mean caused by multiplicative transformations, for example).
The confidence intervals around the SD were estimated using a balanced bootstrap (see Hall 1992; Kowalewski 1996b: fig. 2). We used bootstrapping because it avoids the assumptions of parametric tests (e.g., the form of the sample distribution), offers often more power than other non-parametric tests, and allows the researcher to customize statistical parameters and tests according to specific needs (see Diaconis and Efron 1983; Manly 1991). Each original sample was resampled with replacement 5,000 times (the pilot bootstrap runs showed that the estimates of SD stabilized around 4,000 iterations). The SD was calculated for each bootstrap sample and 0.5, 2.5, 97.5, and 99.5 percentiles of the resulting sampling distribution were used to estimate 95 and 99 confidence intervals around each standard deviation (‘naive bootstrap’ [Efron 1981]). The bootstrap estimates showed a small bias (around 10 years) toward lower SD-values (i.e., the means of the bootstrap distributions were slightly smaller than the actual estimates of SD). The bias -- a common problem in bootstrapping non-normally distributed parameters (Manly 1991) -- was corrected by standardizing the mean standard deviation of the bootstrap distribution to the standard deviation of the original sample. Note that the bias correction could be further improved by complex, computer-intensive methods such as accelerated bias correction (e.g., DiCiccio and Romano 1988). This seemed superfluous here, however, given that the bias is so small that it would not have had any effect on our interpretation even if no correction were applied.
When expressed as a range, time-averaging varies in our samples from 190 to 1060 years with a mean sample range of 661 years (Table 1). The standard deviation varies among samples from 50 to 294 years with a mean of 203 years. Thus, the average shell from a chenier sample differs by Ň200 years from the mean sample age. The confidence intervals around the SD (Table 1) indicate that the SD is significantly larger than zero in all samples.
Scale of Analytical Time-Averaging. ---- Flessa and Kowalewski (1994) compiled radiocarbon dates from the literature to estimate time-averaging in nearshore and shelf environments. One of their datasets included estimates for 49 cheniers from all over the world (for data summary and literature sources see Flessa and Kowalewski 1994). Those estimates were all affected by pooling of samples: i.e., shells used to calculate each estimate were not all from a single sample but came from different sites or strata. Nevertheless, the pooling was limited, because the shells were typically collected from a single chenier or a single chenier series. Because our data are unaffected by sample pooling, we can compare them with those of Flessa and Kowalewski (1994: table 3) to test the hypothesis that pooling of samples increases levels of time-averaging in the data.
The result confirms the expectations (Fig. 3). Mean age range based on 49 localities (3,289 years) is almost five times higher than the value of 661 years obtained for our nine samples (Figs. 3A, 3B). Note that the difference may reflect unequal sample sizes: given the right-skewness of the distribution (Fig. 3A), the arithmetic mean will tend to decrease for small samples because such samples are less likely to include observations from the tail (e.g., Fig. 3B). Nevertheless, the one-tailed, two-sample bootstrap test indicates that the observed difference is statistically significant even when this sampling effect is accounted for (p = 0.0046) (Fig. 3C). Note here that we compared a literature dataset based on data for cheniers from all over the world with nine samples from one study area. It is, thus, possible that the observed difference reflects the fact that the Colorado cheniers are exceptionally little affected by time-averaging relative to average cheniers. However, the Colorado beach ridges are a classic example of cheniers (Kowalewski et al. 1994), and thus, analytical time-averaging seems a much more parsimonious explanation (we do admit: it is also much more exciting) than some unknown differences between the Colorado and all other cheniers.
The Age-Structure and Completeness of Time-Averaged Samples. ---- The age-frequency distributions offer an insight into the internal temporal structure of time-averaged samples (Figs. 4A-4I). In this study, all frequency distribution analyses have been done at a resolution of 50 years. This is the highest realistic resolution given the accuracy and precision of amino-acid dating and the size of our samples. For all samples, the age-distributions are right-skewed (skewness > 0, Table 1), i.e., older shells are increasingly less frequent. Nevertheless, all distributions appear continuous: most, or even all, age-classes between the oldest and youngest shell, contain at least one observation (Fig. 4).
The age-distributions offer insight into the completeness of the record encompassed within a time-averaged sample and can be analyzed in the fashion analogous to estimating paleontological or stratigraphic completeness (e.g., Sadler 1981; Allmon 1989). The temporal completeness of a sample can be estimated as the proportion of the time-intervals containing shells to all the time-intervals included between the oldest and youngest shell in the sample (see also Kowalewski 1996a: fig. 1). This definition is analogous to that for temporal paleontological completeness (Allmon 1989; Kowalewski 1996a). The completeness of samples varies from 41 to 100% with a mean of 63.6% (Table 1). This is remarkable completeness considering, that, at the average sample size of Ň18 and at a resolution set to 50 years (completeness is a scale-dependent phenomenon [Allmon 1989; McKinney 1991; Kowalewski 1996a]), gaps due to sampling are inevitable.
To explore sampling effects rigorously, we simulated incompleteness by random sampling of a uniform distribution (i.e., the distribution that simulates 100%-complete and uniformly distributed record, and thus, provides the most conservative incompleteness estimates). We performed nine independent simulations. For each of the original nine samples, with sample size k and observed age-range r, we drew k observations from the uniform distribution with the range r. For each simulation, 104 random samples were generated and their completeness was calculated at a resolution of 50 years. The mean for 104 random samples, estimates the sample completeness expected for a 100%-complete uniform record, whereas the proportion of random samples less complete than the original sample estimates the probability of 100% completeness.
For eight out of nine samples, the observed incompleteness is statistically indistinguishable from that expected for a 100%-complete uniform record (Table 1) and the mean expected completeness for 100%-complete record (67.3%), sampled to the same degree as our chenier samples, is very close to the observed mean (63.6%) (Table 1).
Variation in Time-Averaging Among Samples. ---- Variation in time-averaging among samples can be analyzed at two different levels: among cheniers and within cheniers. Samples vary among cheniers as is clear both from a visual comparison of the age-distributions (Fig. 4) as well as from a more rigorous analysis of the confidence intervals around the SD (Fig. 5, Table 1). Time-averaging is lower in samples from Chenier 1, than in six out of the seven samples from Cheniers 2-4. One sample from Chenier 4 (4-40), shows an intermediate level of time-averaging.
Samples are very similar within cheniers. For seven out of eight possible pairwise comparisons (1 for Chenier 1, 6 for Chenier 3, and 1 for Chenier 4), age-distributions appear very similar visually and are indistinguishable statistically. With the exception of two samples from Chenier 4, the confidence intervals around the SD overlap strongly.


Even within single samples, and even when those samples were collected to minimize their stratigraphic and lateral span, substantial time-averaging does occur. In cheniers, mollusk shells are so extensively mixed temporally that even directly adjacent shells collected from the same sedimentary layer vary, on average, in age by Ň200 years. Moreover, even at a small sample size of Ň18, the age range between the oldest and youngest shell within sample exceeds, on average, 600 years. This result is consistent with previous quantitative estimates of time-averaging, done in a variety of settings at coarser sampling resolution (e.g., Flessa et al. 1993; Flessa and Kowalewski 1994; Martin et al. 1996; Meldahl et al. 1997), and suggests that the paleontological and geochronological limitations caused by time-averaging cannot be removed by careful sampling. No matter how carefully collected, data from shelly deposits may not be suitable for studying processes that happen on time-scales shorter than hundreds to thousands of years (e.g., Fürsich and Aberhan 1990; Kidwell and Behrensmeyer 1993; Flessa et al. 1993; Kowalewski 1996a). Furthermore, single radiocarbon-dated shells, unless they are found in life position (such shells estimate a deposit’s minimum age), should not be used to estimate the age of a deposit (Goodfriend 1989).
Fürsich and Aberhan (1990) and Behrensmeyer and Hook (1992) pointed out that pooling of data from various samples, outcrops, localities or regions, can result in ‘analytical time-averaging’. Cheniers offer an empirical example which shows that even a very limited pooling of samples (i.e., confined stratigraphically and spatially to single cheniers or chenier series), can significantly increase time-averaging.
Meldahl et al. (1997) recently showed that time-averaging can vary substantially among different environments and subsidence settings. Here we show that time-averaging may vary even among shelly deposits that formed through essentially identical processes in the same setting. Such variation most likely reflects changes through time in the time-averaging-structure of the dead shells in the source area from which the shelly accumulations are being generated (the intertidal mudflat in the case of our cheniers). Because many paleontological patterns such as diversity, morphometric variability, or size-variation can be distorted by time-averaging (see Fürsich and Aberhan 1990; Kidwell and Bosence 1991; Kowalewski 1996a), cryptic variation in time-averaging among seemingly identical shell beds may cause variation, or even trends, that may be difficult to identify as artifacts of temporal mixing.
As has been shown previously (Flessa et al. 1993; Flessa and Kowalewski 1994; Meldahl et al. 1997), the age-distributions of samples are right-skewed, with older shells being increasingly more scarce. This reflects the cumulative destruction of shells with time (Flessa and Kowalewski 1994; Kidwell and Flessa 1995; Meldahl et al. 1997). Nevertheless, at a resolution as fine as 50 years, chenier samples are characterized by ‘uniform time-averaging’, with all time-averaged time-intervals equally represented. This extreme case of ‘continuous time-averaging’ (sensu Fürsich and Aberhan 1990) has three interesting implications. First, as pointed out repeatedly (Walker and Bambach 1971; Staff et al. 1986; Fürsich and Aberhan 1990; Kidwell and Flessa 1995; Kowalewski 1996a), time-averaging may be advantageous to paleontologists because it can eliminate the noise introduced by short-term fluctuations. This study suggests that some shell beds undergo the best type of averaging we could ever have hoped for: uniform and continuous, with shelly mollusks from all time-averaged time-intervals equally represented in the samples. Thus, the samples can provide information about relative abundance of shelly taxa weighted by the duration of their presence in the benthic ecosystems. Second, a uniform, continuous time-averaging may generate the pattern similar to that of coordinated stasis (sensu Brett et al. 1996). This is because similar spectra of species generated by fluctuating environments are being repeatedly time-averaged into consecutive shell beds (see also Bambach and Bennington 1996). Indeed, two adjacent generations of cheniers have very similar taxonomic composition (Kowalewski et al. 1994). This reflects a long-term time-averaging of benthic associations from a similar range of environments and not some ecological phenomenon (e.g., ecological locking [Brett et al. 1996]). Third, there is also an interesting corollary for Quaternary studies here. When high-resolution dating is employed, some bioclastic accumulations can provide 100%-complete paleontological record at a resolution of 50 years, and by this, permit exceptional insights into the rapid environmental and climatic changes in the late Pleistocene and Holocene (Flessa et al. 1997).

Final Remarks

We would not argue that our results and their implications are valid for the entire Phanerozoic and for all types of shell beds. Many parameters controlling the formation of bioclastic deposits, and even bioclasts themselves, have changed dramatically throughout the Phanerozoic (Kidwell and Brenchley 1994, 1996; Kowalewski 1996a). Moreover, even temporally co-eval shell beds vary in time-averaging depending on a variety of factors (see Kidwell and Bosence 1991; Kowalewski 1996a, 1997; Meldahl et al. 1997), and spectacular examples of shell beds little affected by time-averaging exist (e.g., Boyajian and Thayer 1995). Nevertheless, we do believe that our results have implications reaching far beyond Holocene macrotidal lag deposits and that the time-averaging patterns identified here are valid for many, or even most of, the mollusk-dominated shell beds, especially for the Cenozoic fossil record. Four arguments defend the general validity of our results.
First, our estimates are consistent with previous studies done in other settings, for other types of deposits, and for other bioclast producers. Second, time-averaging is a function of the availability of old shells in the depositional system, and thus, any shell bed, regardless of its mode of formation, will be time-averaged when old shells are common in the area. In other words, it is not so much important how a given deposit is formed but rather what is it generated from. If a major storm hit the Colorado delta, the resulting deposit, even though formed in several hours rather than several decades, would be made of the same bioclasts that make up the cheniers. Because old shells are ubiquitous in modern depositional systems (Flessa and Kowalewski 1994), we can generally expect similar levels of time-averaging in most of the currently forming bioclastic deposits. The consistent estimates of time-averaging yielded by studies done in a variety of settings are, therefore, not so surprising. Third, our estimates are conservative because we excluded outliers, confined study to one species, and used conservative analytical methods. In addition, and perhaps most importantly, the cheniers will likely undergo further reworking and smearing before getting incorporated into the fossil record. This means further temporal mixing. Thus, we can expect that the uniform and continuous nature of time-averaging will be even further enhanced. Finally, some of the results, especially those on analytical time-averaging and cryptic variation in time-averaging illustrate some important phenomena that may be encountered when studying fossil shell beds, regardless of their age and mode of origin.
The most important conclusion of our study is a paleoecological one. Our results suggest that even a single sample carefully collected from one level in a single shell bed is still affected by long-term time-averaging. Even at the highest spatial resolution, paleoecological patterns entombed in shelly deposits reflect a long-term record of the shelly fauna averaged from the spectrum of environments that existed during some interval of time. The reasoning and models stemming from a strictly ecological-neontological approach may rarely be justifiable when studying shell beds.


Supported by NSF grants EAR-9405311 to K. W. Flessa and EAR-9405412 to G. A. Goodfriend. M. Kowalewski thanks the Alexander von Humboldt Foundation for financial support and W. Oschmann and J. Nebelsick from the University of Tübingen for hospitality. We thank J. Nebelsick for useful comments on the manuscript. We are indebted to P.E. Hare for the use of laboratory facilities for racemization analysis. This is publication 26 of the Centro de Estudios de Almejas Muertas (/ceam).

Literature cited

Go back to Michal's page

Check other C.E.A.M. pages:

Back to C.E.A.M. publications

C.E.A.M. current abstracts

C.E.A.M. research in taphonomy

Return to C.E.A.M. Main Page

Return to U. of A. Geosciences Home Page

Comments or questions about this web page should be sent to
Michal Kowalewski at: michael.kowalewski@unituebingen.de