liu.seSearch for publications in DiVA
Change search
ReferencesLink to record
Permanent link

Direct link
Resampling species-wise abundance data
Linköping University, Faculty of Arts and Sciences. Linköping University, Department of Mathematics, Statistics .
2006 (English)In: The 17th Annual Conference of The International Environmetrics Society (TIES), Kalmar, Sweden, 2006Conference paper (Other academic)
Abstract [en]

 Monitoring the abundance of plant species in grasslands is time-consuming. Accordingly, sampling or inspection is usually sparse both in time and space. Typically, a grassland area is visited 1-2 times per decade, and each time 5-20 plots are inspected. For each plot (about one square meter) an inspection protocol, containing coverage data for up to 100 species, is established. The collected data can thus be characterized as high-dimensional and sparse. Moreover, it is not unusual that some of the monitored species are present in only a few of the investigated plots, i.e., the vectors of coverage data may contain numerous zeroes. The analysis of abundance data can be either multivariate or univariate. Canonical correlation analysis (CCA) and redundancy analysis (RDA) are widely used multivariate methods. Univariate analyses are usually applied to summary statistics, such as diversity indices or measures of evenness. In either case, the complexity of the data makes it difficult to use parametric models for inference about the whole grassland, and modest sample sizes prevents using asymptotic results. Due to this, nonparametric methods, such as permutation tests, are often used to assess trends in abundance data. However, the power of these tests may be low due to the small number of sampling occasions. Here, we propose a resampling technique that can be used to determine the distribution of arbitrary estimators or test statistics based on high-dimensional abundance data. The original idea of the bootstrap is to substitute the true (but unknown) cumulative distribution function (cdf) for an empirical cumulative distribution function (edf) calculated from a sample of observations. When the collected data can be regarded as a simple random sample, the bootstrap principle provides a convenient method to determine the distributions of a large number of moment-related statistics (e.g. Singh, 1981). Also, it has been demonstrated that regression or time series data can be resampled by first extracting residuals (or innovations) and then forming pseudo data by resampling these residuals (Wu, 1989; Kreiss & Franke, 1992). We propose high-dimensional abundance data be resampled by extracting residuals from a principal components factor analysis in which a small number of factors are retained. Furthermore, we handle point masses at zero (absent species) by using a truncated probit function to transform the original data prior to the principal components factor analysis, and to back-transform the pseudo data. The threshold and the number of factors retained are determined in such a way that the most important features of the resampled data are similar to those of the original observations in the most important resoe. In particular, the number of observed species should not differ too much. The latter is achieved by using a subsampling procedure, in which the number of zeros (i.e. non-observed species) in a subsample and in pseudo-data from that subsample are compared. Also, relative biases and coverage degrees of empirical confidence intervals are optimized. The performance of our procedure is illustrated by extensive simulations and a case study of temporal changes in Shannon entropy in a grassland in South West Sweden.

Place, publisher, year, edition, pages
National Category
URN: urn:nbn:se:liu:diva-37178Local ID: 33869OAI: diva2:258027
Available from: 2009-10-10 Created: 2009-10-10 Last updated: 2010-09-28

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Nordgaard, Anders
By organisation
Faculty of Arts and SciencesStatistics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 13 hits
ReferencesLink to record
Permanent link

Direct link