By means of ab initio molecular dynamics (AIMD) simulations we carry out a detailed stdly of the electronic and atomic structure of Mo upon the thermal stabilization of its dynamically unstable face-centered cubic (fcc) phase, Wc calculate how the atomic positions, radial distribution function, and the ei<xtronic density of states of fcc Mo evolve with temperature. The results are compared with those for dynamically stable body-centered cubic (bcc) phase of Mo, as well as with bcc Zr, which is dynamically unstable at T = OK, but (in contrast to fcc Mo) becomes thermodynamically stable at high temperature, In particular, wc emphasize the difference between the local positions of atoms in the simulation boxes at a particular step of AIMD simulation and the average positions, around which the atoms vibrate, and show that the former are solcly responsible for the electronic properties of the material. WE observe that while the average atomic positions in fcc Mo correspond perfectly to the ideal structure at high temperature, the electronic structure of the metal calculated from AIMD differs substantially from the canonical shape of the density of states for the ideal fcc crystaL From a comparison of our results obtained for fcc Mo arid bcc Zr, we advocate the use of the electronic structure calculations, complemented with studies of radial distribution functions, as a sensitive test of a degree of the temperature induced stabilization of phases, which are dynamically unstable at T = OK.
Humic matter has recently been shown to contain considerable quantities of naturally produced organohalogens. The present study investigated the possibility of a non-specific, enzymatically mediated halogenation of organic matter in soil. The results showed that, in the presence of chloride and hydrogen peroxide, the enzyme chloroperox1dase (CPO) from the fungus Caldariomyces fumago catalyzes chlorination of fulvic acid. At pH 2.5 - 6.0, the chlorine to fulvic acid ratio in the tested sample was elevated from 12 mg/g to approximately 40-50 mg/g. It was also shown that this reaction can take place at chloride and hydrogen peroxide concentrations found in the environment. An extract from spruce forest soil was shown to have a measurable chlorinating capacity. The activity of an extract of 0.5 kg soil corresponded to approximately 0.3 enzyme units, measured as CPO activity. Enzymatically mediated halogenation of humic substances may be one of the mechanisms explaining the w1despread occurrence of adsorbable organic halogens (AOX) in soil and water.
Weather extremes often occur along fronts passing different sites with some time lag. Here, we show how such temporal patterns can be taken into account when exploring inter-site dependence of extremes. We incorporate time lags into existing models and into measures of extremal associations and their relation to the distance between the investigated sites. Furthermore, we define summarizing parameters that can be used to explore tail dependence for a whole network of stations in the presence of fixed or stochastic time lags. Analysis of hourly precipitation data from Sweden showed that our methods can prevent underestimation of the strength and spatial extent of tail dependencies.
Extreme precipitation events vary with regard to duration, and hence sub-daily data do not necessarily exhibit the same trends as daily data. Here, we present a framework for a comprehensive yet easily undertaken statistical analysis of long-term trends in daily and sub-daily extremes. A parametric peaks-over-threshold model is employed to estimate annual percentiles for data of different temporal resolution. Moreover, a trend-durationfrequency table is used to summarize how the statistical significance of trends in annual percentiles varies with the temporal resolution of the underlying data and the severity of the extremes. The proposed framework also includes nonparametric tests that can integrate information about nonlinear monotonic trends at a network of stations. To illustrate our methodology, we use climate model output data from Kalmar, Sweden, and observational data from Vancouver, Canada. In both these cases, the results show different trends for moderate and high extremes, and also a clear difference in the statistical evidence of trends for daily and sub-daily data.
Temporal trends in meteorological extremes are often examined by first reducing daily data to annual index values, such as the 95th or 99th percentiles. Here, we report how this idea can be elaborated to provide an efficient test for trends at a network of stations. The initial step is to make separate estimates of tail probabilities of precipitation amounts for each combination of station and year by fitting a generalised Pareto distribution (GPD) to data above a user-defined threshold. The resulting time series of annual percentile estimates are subsequently fed into a multivariate Mann-Kendall (MK) test for monotonic trends. We performed extensive simulations using artificially generated precipitation data and noted that the power of tests for temporal trends was substantially enhanced when ordinary percentiles were substituted for GPD percentiles. Furthermore, we found that the trend detection was robust to misspecification of the extreme value distribution. An advantage of the MK test is that it can accommodate non-linear trends, and it can also take into account the dependencies between stations in a network. To illustrate our approach, we used long time series of precipitation data from a network of stations in The Netherlands.
We develop new techniques to summarize and visualize spatial patterns of coincidence in weather events such as more or less heavy precipitation at a network of meteorological stations. The cosine similarity measure, which has a simple probabilistic interpretation for vectors of binary data, is generalized to characterize spatial dependencies of events that may reach different stations with a variable time lag. More specifically, we reduce such patterns into three parameters (dominant time lag, maximum cross-similarity, and window-maximum similarity) that can easily be computed for each pair of stations in a network. Furthermore, we visualize such threeparameter summaries by using colour-coded maps of dependencies to a given reference station and distance-decay plots for the entire network. Applications to hourly precipitation data from a network of 93 stations in Sweden illustrate how this method can be used to explore spatial patterns in the temporal synchrony of precipitation events.
We present a new algorithm for monotonic regression in one or more explanatory variables. Formally, our method generalises the well-known PAV (pool-adjacent-violators) algorithm from fully to partially ordered data. The computational complexity of our algorithm is O(n2). The goodness-of-fit to observed data is much closer to optimal than for simple averaging techniques.
Monotonic regression is a nonparametric method for estimation ofmodels in which the expected value of a response variable y increases ordecreases in all coordinates of a vector of explanatory variables x = (x_{1}, …, x_{p}).Here, we examine statistical and computational aspects of our recentlyproposed generalization of the pool-adjacent-violators (PAV) algorithm fromone to several explanatory variables. In particular, we show how the goodnessof-fit and accuracy of obtained solutions can be enhanced by presortingobserved data with respect to their level in a Hasse diagram of the partial orderof the observed x-vectors, and we also demonstrate how these calculations canbe carried out to save computer memory and computational time. Monte Carlosimulations illustrate how rapidly the mean square difference between fittedand expected response values tends to zero, and how quickly the mean squareresidual approaches the true variance of the random error, as the number of observations increases up to 10^{4}.
Monotonic regression (MR) is a least distance problem with monotonicity constraints induced by a partially ordered data set of observations. In our recent publication [In Ser. {\sl Nonconvex Optimization and Its Applications}, Springer-Verlag, (2006) {\bf 83}, pp. 25-33], the Pool-Adjacent-Violators algorithm (PAV) was generalized from completely to partially ordered data sets (posets). The new algorithm, called GPAV, is characterized by the very low computational complexity, which is of second order in the number of observations. It treats the observations in a consecutive order, and it can follow any arbitrarily chosen topological order of the poset of observations. The GPAV algorithm produces a sufficiently accurate solution to the MR problem, but the accuracy depends on the chosen topological order. Here we prove that there exists a topological order for which the resulted GPAV solution is optimal. Furthermore, we present results of extensive numerical experiments, from which we draw conclusions about the most and the least preferable topological orders.
In this paper, the monotonic regression problem (MR) is considered. We have recentlygeneralized for MR the well-known Pool-Adjacent-Voilators algorithm(PAV) from the case of completely to partially ordered data sets. Thenew algorithm, called GPAV, combines both high accuracy and lowcomputational complexity which grows quadratically with the problemsize. The actual growth observed in practice is typically far lowerthan quadratic. The fitted values of the exact MR solution composeblocks of equal values. Its GPAV approximation has also a blockstructure. We present here a technique for refining blocks produced bythe GPAV algorithm to make the new blocks more close to those in theexact solution. This substantially improves the accuracy of the GPAVsolution and does not deteriorate its computational complexity. Thecomputational time for the new technique is approximately triple thetime of running the GPAV algorithm. Its efficiency is demonstrated byresults of our numerical experiments.
Isotonic regression problem (IR) has numerous important applications in statistics, operations research, biology, image and signal processing and other areas. IR in L2-norm is a minimization problem in which the objective function is the squared Euclidean distance from a given point to a convex set defined by monotonicity constraints of the form: i-th component of the decision vector is less or equal to its j-th component. Unfortunately, the conventional optimization methods are unable to solve IR problems originating from large data sets. The existing IR algorithms, such as the minimum lower sets algorithm by Brunk, the min-max algorithm by Lee, the network flow algorithm by Maxwell & Muchstadt and the IBCR algorithm by Block et al. are able to find exact solution to IR problem for at most a few thousands of variables. The IBCR algorithm, which proved to be the most efficient of them, is not robust enough. An alternative approach is related to solving IR problem approximately. Following this approach, Burdakov et al. developed an algorithm, called GPAV, whose block refinement extension, GPAVR, is able to solve IR problems with a very high accuracy in a far shorter time than the exact algorithms. Apart from this, GPAVR is a very robust algorithm, and it allows us to solve IR problems with over hundred thousands of variables. In this talk, we introduce new exact IR algorithms, which can be viewed as active set methods. They use the approximate solution produced by the GPAVR algorithm as a starting point. We present results of our numerical experiments demonstrating the high efficiency of the new algorithms, especially for very large-scale problems, and their robustness. They are able to solve the problems which all existing exact IR algorithms fail to solve.
In this talk we consider the isotonic regression (IR) problem which can be formulated as follows. Given a vector $\bar{x} \in R^n$, find $x_* \in R^n$ which solves the problem: \begin{equation}\label{ir2} \begin{array}{cl} \mbox{min} & \|x-\bar{x}\|^2 \\ \mbox{s.t.} & Mx \ge 0. \end{array} \end{equation} The set of constraints $Mx \ge 0$ represents here the monotonicity relations of the form $x_i \le x_j$ for a given set of pairs of the components of $x$. The corresponding row of the matrix $M$ is composed mainly of zeros, but its $i$th and $j$th elements, which are equal to $-1$ and $+1$, respectively. The most challenging applications of (\ref{ir2}) are characterized by very large values of $n$. We introduce new IR algorithms. Our numerical experiments demonstrate the high efficiency of our algorithms, especially for very large-scale problems, and their robustness. They are able to solve some problems which all existing IR algorithms fail to solve. We outline also our new algorithms for monotonicity-preserving interpolation of scattered multivariate data. In this talk we focus on application of our IR algorithms in postprocessing of FE solutions. Non-monotonicity of the numerical solution is a typical drawback of the conventional methods of approximation, such as finite elements (FE), finite volumes, and mixed finite elements. The problem of monotonicity is particularly important in cases of highly anisotropic diffusion tensors or distorted unstructured meshes. For instance, in the nuclear waste transport simulation, the non-monotonicity results in the presence of negative concentrations which may lead to unacceptable concentration and chemistry calculations failure. Another drawback of the conventional methods is a possible violation of the discrete maximum principle, which establishes lower and upper bounds for the solution. We suggest here a least-change correction to the available FE solution $\bar{x} \in R^n$. This postprocessing procedure is aimed on recovering the monotonicity and some other important properties that may not be exhibited by $\bar{x}$. The mathematical formulation of the postprocessing problem is reduced to the following convex quadratic programming problem \begin{equation}\label{ls2} \begin{array}{cl} \mbox{min} & \|x-\bar{x}\|^2 \\ \mbox{s.t.} & Mx \ge 0, \quad l \le x \le u, \quad e^Tx = m, \end{array} \end{equation} where$e=(1,1, \ldots ,1)^T \in R^n$. The set of constraints $Mx \ge 0$ represents here the monotonicity relations between some of the adjacent mesh cells. The constraints $l \le x \le u$ originate from the discrete maximum principle. The last constraint formulates the conservativity requirement. The postprocessing based on (\ref{ls2}) is typically a large scale problem. We introduce here algorithms for solving this problem. They are based on the observation that, in the presence of the monotonicity constraints only, problem (\ref{ls2}) is the classical monotonic regression problem, which can be solved efficiently by some of the available monotonic regression algorithms. This solution is used then for producing the optimal solution to problem (\ref{ls2}) in the presence of all the constraints. We present results of numerical experiments to illustrate the efficiency of our algorithms.
We consider the problem of minimizing the distance from a given n-dimensional vector to a set defined by constraintsof the form xi xj Such constraints induce a partial order of the components xi, which can be illustrated by an acyclic directed graph.This problem is known as the isotonic regression (IR) problem. It has important applications in statistics, operations research and signal processing. The most of the applied IR problems are characterized by a very large value of n. For such large-scale problems, it is of great practical importance to develop algorithms whose complexity does not rise with n too rapidly.The existing optimization-based algorithms and statistical IR algorithms have either too high computational complexity or too low accuracy of the approximation to the optimal solution they generate. We introduce a new IR algorithm, which can be viewed as a generalization of the Pool-Adjacent-Violator (PAV) algorithm from completely to partially ordered data. Our algorithm combines both low computational complexity O(n2) and high accuracy. This allows us to obtain sufficiently accurate solutions to the IR problems with thousands of observations.
We consider the problem of minimizing the distance from a given n-dimensional vector to a set defined by constraints of the form x_{i} ≤ x_{j}. Such constraints induce a partial order of the components x_{i}, which can be illustrated by an acyclic directed graph. This problem is also known as the isotonic regression (IR) problem. IR has important applications in statistics, operations research and signal processing, with most of them characterized by a very large value of n. For such large-scale problems, it is of great practical importance to develop algorithms whose complexity does not rise with n too rapidly. The existing optimization-based algorithms and statistical IR algorithms have either too high computational complexity or too low accuracy of the approximation to the optimal solution they generate. We introduce a new IR algorithm, which can be viewed as a generalization of the Pool-Adjacent-Violator (PAV) algorithm from completely to partially ordered data. Our algorithm combines both low computational complexity O(n^{2}) and high accuracy. This allows us to obtain sufficiently accurate solutions to IR problems with thousands of observations.
Large-Scale Nonlinear Optimization reviews and discusses recent advances in the development of methods and algorithms for nonlinear optimization and its applications, focusing on the large-dimensional case, the current forefront of much research.
The chapters of the book, authored by some of the most active and well-known researchers in nonlinear optimization, give an updated overview of the field from different and complementary standpoints, including theoretical analysis, algorithmic development, implementation issues and applications
We examined under what circumstances the results of a large number of runs of the one-dimensional, physics-based SOIL/SOILN nitrate transport model can be combined into a reduced (or meta) model. We considered the total flow of nitrate from a given area and investigated when and how hidden linear structures can be extracted from the underlying model. The presence of such structures can justify the use of spatially aggregated inputs to compute spatially aggregated outputs. Extensive Monte-Carlo simulations showed that some linear structures emerged when the outputs for a long period of time were summed. Other linear structures appeared as relationships between two different components of the model outputs. However, different cropping systems respond differently to changes in anthropogenic or meteorological forcings. Therefore, we derived a reduced model of long-term leaching of nitrogen from the root zone in an agricultural area by combining each combination of soil type and cropping system. Reduced models can help make process-oriented models more transparent, and they are particularly suitable for incorporation into decision support systems.
Decision support systems (DSSs) for evaluation of different policy measures have two important functions: To assess how considered policy measures may influence the behavior of actors, and to predict the effects of a given set of actions generated from the anticipated behavior. So far, almost all attempts to construct DSSs for environmental management have focused on assessing the impact of a set of actions on the environment. Here, we describe the generic structure of a DSS that enables more complete evaluation of regional or national policies to reduce nitrogen inputs to water. In particular, we expound the principles for linking models of farm economic behavior to catchment-scale models of the transport and transformation of nitrogen in soil and water. First, we define system boundaries for nitrogen fluxes through the agricultural sector and the ambient environment to create a basis for model integration. Thereafter, we show how different modules operating on different temporal and spatial scales can be interlinked. Finally, we demonstrate how statistical emulators or meta-models can be derived to reduce the computational burden and increase the transparency of the DSS. In particular, we show when and how the temporal or spatial resolution of model inputs can be reduced without significantly influencing the estimates of annual nitrogen fluxes on a catchment scale.
Runoff from different catchment areas in southem Sweden was tested in a root bioassay based on solution cultures of cucumber seedlings. Water samples from agricultural catchment areas produced no signs at all or only weak signs of inhibited root growth, whereas several water samples from catchmcnt areas dominated by mires or coniferous forests produced visible root injuries. The most severe root injuries (very short roots, discolouration, swelling of root tips and lack of root hairs) were caused by samples from a catchment area without local emissions and dominated by old stands of spruce. Fractionation by ultrafiltration showed that the phytotoxic effect of these samples could be attributed to organic matter with a nominal molecular-weight exceeding 1000 or to substances associated with organic macromolecules. Experiments aimed at concentrating phytotoxic compounds from surface water indicated that the observed growth inhibition was caused by strongly hydrophilic substances. Previous reports on phytotoxic, organic substances of natura! origin have emphasized interaction between plants growing close together. The presence of phytotoxic substances in runoff indicates that there is also a large-scale dispersion of such compounds.
Empirical data regarding the time scales of nutrient losses from soil to water and land to sea were reviewed. The appearance of strongly elevated concentrations of nitrogen and phosphorus in major European rivers was found to be primarily a post-war phenomenon. However, the relatively rapid water quality response to increased point source emissions and intensified agriculture does not imply that the reaction to decreased emissions will be equally rapid. Long-term fertilisation experiments have shown that important processes in the large-scale turnover of nitrogen operate on a time scale of decades up to at least a century, and in several major Eastern European rivers there is a remarkable lack of response to the dramatic decrease in the use of commercial fertilisers that started in the late 1980s. In Western Europe, studies of decreased phosphorus emissions have shown that riverine loads of this element can be rapidly reduced from high to moderate levels, whereas a further reduction, if achieved at all, may take decades. Together, the reviewed studies showed that the inertia of the systems that control the loss of nutrients from land to sea was underestimated when the present goal of a 50% reduction of the input of nutrients to the Baltic Sea and the North Sea was adopted. (C) 2000 Elsevier Science B.V.
Multiple time series of environmental quality data with similar, but not necessarily identical, trends call for multivariate methods for trend detection and adjustment for covariates. Here, we show how an additive model in which the multivariate trend function is specified in a nonparametric fashion (and the adjustment for covariates is based on a parametric expression) can be used to estimate how the human impact on an ecosystem varies with time and across components of the observed vector time series. More specifically, we demonstrate how a roughness penalty approach can be utilized to impose different types of smoothness on the function surface that describes trends in environmental quality as a function of time and vector component. Compared to other tools used for this purpose, such as Gaussian smoothers and thin plate splines, an advantage of our approach is that the smoothing pattern can easily be tailored to different types of relationships between the vector components. We give explicit roughness penalty expressions for data collected over several seasons or representing several classes on a linear or circular scale. In addition, we define a general separable smoothing method. A new resampling technique that preserves statistical dependencies over time and across vector components enables realistic calculations of confidence and prediction intervals.
There is a need to examine long-term changes in nitrogen leaching from arable soils. The purpose of this study was to analyse variations in specific leaching rates (kg ha-1 per year) and gross load (Mg per year) of N from arable land to watercourses in Sweden from a historical perspective. The start of the study was set to 1865 because information on crop distribution, yield and livestock has been compiled yearly since then. The SOIL/SOILN model was used to calculate nitrogen leaching. Calculations were done for cereals, grass and bare fallow for three different soil types in nine agricultural regions covering a range of climatic conditions. Results indicate that both specific leaching rates and gross load of nitrogen in the middle of 19th century were approximately the same as they are today for the whole of south and central Sweden. Three main explanations for this were (1) large areas of bare fallow typical for the farming practice at the time, (2) enhanced mineralisation from newly cultivated land, and (3) low yield. From 1865, i.e. the start of the calculations, N leaching rates decreased and were at their lowest around 1930. During the same period, gross load was also at its lowest despite the fact that the acreage of arable land was at its most extensive. After 1930, average leaching increased by 60% and gross load by 30%, both reaching a peak in the mid-1970s to be followed by a declining trend. The greatest increase in leaching was in regions where the increase in animal density was largest and these regions were also those where the natural conditions for leaching such as mild winters and coarse-textured soils were found. Extensive draining projects occurred during the period of investigation, in particular an intensive exploitation of lakes and wetlands. This caused a substantial drop in nitrogen retention and the probable increase in net load to the sea might thus have been more affected by this decrease in retention than the actual increase in gross load. (C) 2000 Elsevier Science B.V.
The International Council for the Exploration of the Sea (ICES) has longcompiled extensive data on contaminants in biota. We investigated how trendassessment of mercury in muscle tissue from fish (flounder and Atlantic cod)might be facilitated by using nonparametric regression to normalise observedlevels of this contaminant with respect to body length and weight. Specifically,we examined response surfaces and annual normalised means obtained byemploying purely additive models (AM), thin plate splines (TPS), andmonotonic regression (MR) to model mercury levels as functions of samplingyear and one or two covariates. Our analysis showed that TPS and MR modelscan be more satisfactory than purely additive models, because the formertechniques enable estimation of time-dependent relationships between themercury concentration and the covariates. However, the major obstacle fortrend assessment of the collected mercury data was a substantial interannualvariation that was related to factors other than body length and weight.Nevertheless, several time series of flounder data that started in the 1970s and1980s showed obvious downward trends, and these trends were particularly2strong in the Elbe estuary. When the analysis was limited to data collected after1990, an overall Mann-Kendall test for all sampling sites revealed astatistically significant downward trend for flounder, whereas it was notsignificant for cod.
Monotonic regression is a non-parametric method that is designed especially for applications in which the expected value of a response variable increases or decreases in one or more explanatory variables. Here, we show how the recently developed generalised pool-adjacent-violators (GPAV) algorithm can greatly facilitate the assessment of trends in time series of environmental quality data. In particular, we present new methods for simultaneous extraction of a monotonic trend and seasonal components, and for normalisation of environmental quality data that are influenced by random variation in weather conditions or other forms of natural variability. The general aim of normalisation is to clarify the human impact on the environment by suppressing irrelevant variation in the collected data. Our method is designed for applications that satisfy the following conditions: (i) the response variable under consideration is a monotonic function of one or more covariates; (ii) the anthropogenic temporal trend is either increasing or decreasing; (iii) the seasonal variation over a year can be defined by one increasing and one decreasing function. Theoretical descriptions of our methodology are accompanied by examples of trend assessments of water quality data and normalisation of the mercury concentration in cod muscle in relation to the length of the analysed fish.
The reunification of Germany led to dramatically reduced emissions of nitrogen (N) and phosphorus (P) to the environment. The aim of the present study was to examine how these exceptional decreases influenced the amounts of nutrients carried by the Elbe River to the North Sea. In particular, we attempted to extract anthropogenic signals from time series of riverine loads of nitrogen and phosphorus by developing a normalization technique that enabled removal of natural fluctuations caused by several weather-dependent variables. This analysis revealed several notable downward trends. The normalized loads of total-N and NO_{3}-N exhibited an almost linear trend, even though the nitrogen surplus in agriculture dropped dramatically in 1990 and then slowly increased. Furthermore, the decrease in total-P loads was found to be considerably smaller close to the mouth of the river than further upstream. Studying the predictive ability of different normalization models showed the following: (i) nutrient loads were influenced primarily by water discharge; (ii) models taking into account water temperature, load of suspended particulate matter, and salinity were superior for some combinations of sampling sites and nutrient species; semiparametric normalization models were almost invariably better than ordinary regression models.
A case study carried out at a municipal drinking water treatment plant in southern Sweden showed that the formation of short-chain fatty acids in slow sand filters can result in severe off-flavour problems. When an extract of the headspace of the surface layer of a sand filter was subjected to gas chromatographic analysis with sensory detection (GC sniffing), several strong, rancid odours were detected. Mass spectrometric analysis of the same extract, before and after methylation, showed that substantial amounts of butyric acid, valeric acid and isovaleric acid were present in the analysed sample. The off-flavour caused by these compounds was removed by repeated shock chlorination of the malfunctioning slow sand filter. Analysis of fatty acid esters may provide an early warning of the described off-flavour problem.
Riverine transport is the, most important pathway for input of nutrients to the Gulf of Riga. The present study focused on updating existing estimates of the riverine nutrient contributions and on improving the favailable information on temporal and spatial variation in such input. The results show that the gulf received an average of 113,300 tons of nitrogen, 2050 tons of phosphorus and 64,900 tons of dissolved silica (DSi) annually during the time period 1977-1995. There was large interannual variation in loads, e.g., a factor two difference was found between the two most extreme years (1984 and 1990), this was attributed mainly to natural variation in water discharge. The seasonal distribution of nutrient loads exhibited a distinct pattern for practically all studied constituents, especially nitrate. Loads were high during the spring-flow and relatively low during the low-flow summer period. Examination of the spatial variation of nutrient loads showed that the Daugava River alone accounted for approximately 60% of the total riverine load. The highest area-specific loads of nitrate and phosphate were observed in the agriculturally dominated Lielupe River, and the highest loads of organic-nitrogen (org-N) and total phophorus (tot-P) were found in the Parnu River. However, the values for all the studied rivers and constituents were rather low (phosphorus) or moderate (nitrogen and silica) compared to those reported for many other drainage areas of the Baltic Sea. This was true despite the inefficient sewage treatment and intensive agriculture in the studied basins in the 1970s and 1980s. (C) 1999 Elsevier Science B.V. All rights reserved.
Meteorological normalisation of time series of air quality data aims to extract anthropogenic signals by removing natural fluctuations in the collected data. We showed that the currently used procedures to select normalisation models can cause over-fitting to observed data and undesirable smoothing of anthropogenic signals. A simulation study revealed that the risk of such effects is particularly large when: (i) the observed data are serially correlated, (ii) the normalisation model is selected by leave-one-out cross-validation, and (iii) complex models, such as artificial neural networks, are fitted to data. When the size of the test sets used in the cross-validation was increased, and only moderately complex linear models were fitted to data, the over-fitting was less pronounced. An empirical study of the predictive ability of different normalisation models for tropospheric ozone in Finland confirmed the importance of using appropriate model selection strategies. Moderately complex regional models involving contemporaneous meteorological data from a network of stations were found to be superior to single-site models as well as more complex regional models involving both contemporaneous and time-lagged meteorological data from a network of stations.
Trend analyses of time series of environmental data are often carried out to assess the human impact on the environment under the influence of natural fluctuations in temperature, precipitation, and other factors that may affect the studied response variable. We examine the performance of partial Mann–Kendall (PMK) tests, i.e. trend tests in which the critical region is determined by the conditional distribution of one Mann-Kendall (MK) statistic for monotone trend, given a set of other MK statistics. In particular, we examine the impact of incorporating information regarding covariates in the Hirsch–Slack test for trends in serially correlated data collected over several seasons. Monte Carlo simulation of the performance of PMK tests demonstrates that the gain in power due to incorporation of relevant covariates can be large compared to the loss in power caused by irrelevant covariates. Furthermore, we have found that the asymptotic normality of the test statistics in such tests enables rapid and reliable determination of critical regions, unless the sample size is very small (n < 10) or the different MK statistics are very strongly correlated. A case study of water quality trends shows that PMK tests can detect and correct for rather complex relationships between river water quality and water discharge. The generic character of the PMK tests makes them particularly useful for scanning large sets of data for temporal trends.
Power analysis is an integral part of statistical hypothesis testing, and, when neither exactpower computations nor reasonable approximations are feasible, Monte Carlo simulations providea viable alternative. However, generating data for such simulations is often an intricate task, especially when the hypothesis testing is based on non-normal multivariate data withcomplex dependencies. Here, we show how process-based deterministic models can be employed to generate data with adequate statistical dependencies for realistic power simulations. In particular, we usedthe Integrated Nitrogen in Catchments INCA model to produce bivariate time series ofnitrogen concentration and water discharge data that included plausible temporal trends andrealistic cross-correlations, seasonal patterns, and memory effects. The random variation in thegenerated data was achieved by running the INCA model with various sets of weather data that were obtained by block resampling from a given time series of observed air temperatureand precipitation data. The assortment of temporal trends was created by altering the anthropogenic input of nitrogen to the catchment under consideration.Two tests for temporal trends in nitrogen concentration were compared: i a partial Mann-Kendall test in which water discharge was treated as a covariate; ii a two-stage procedurein which we first used a semi-parametric regression technique to remove the impact of natural fluctuations in water discharge, and we subsequently applied an ordinary Mann-Kendall testto the obtained residuals. Our simulations demonstrated that the two tests had comparablepower, but also that they involved empirical significance levels that were much higher than the nominal levels, possibly due to substantial serial dependence in the data.
The objective of meteorological normalisation of air quality measurements is to extract anthropogenic signals by removing meteorologically driven fluctuations in the collected data. We found that standard normalisation procedures involving regression of air quality on local meteorological data can be improved by incorporating information on four-day back trajectories of the sampled air masses. A case study of tropospheric ozone data revealed that the most efficient normalisation was achieved by including selected trajectory coordinates directly in multivariate regression models. Summarising the trajectories into clusters or sector values prior to the normalisation indicated that there was a slight loss of information.
Despite extensive efforts to ensure that sampling and installation and maintenance of instruments are as efficient as possible when monitoring air pollution data, there is still an indisputable need for statistical post processing (quality assessment). We examined data on tropospheric ozone and found that meteorological normalisation can reveal (i) errors that have not been eliminated by established procedures for quality assurance and control of collected data, as well as (ii) inaccuracies that may have a detrimental effect on the results of statistical tests for temporal trends. Moreover, we observed that the quality assessment of collected data could be further strengthened by combining meteorological normalisation with non-parametric smoothing techniques for seasonal adjustment and detection of sudden shifts in level. Closer examination of apparent trends in tropospheric ozone records from EMEP (European Monitoring and Evaluation Programme) sites in Finland showed that, even if potential raw data errors were taken into account, there was strong evidence of upward trends during winter and early spring.
All estimates of substance flows are more or less uncertain, which implies that the collected data can violate mass balance constraints that should be valid. In this article, we introduce multi-stage balancing algorithms that can accommodate prior information about mass balance constraints and uncertainty of the collected data. In particular, we formulate the balancing task as an optimization problem for a given set of prior information. If it is suspected that some flows have been overlooked, the balancing is achieved by minimizing the total increase in flows that is required to satisfy the given mass balance constraints. If the major problem consists of errors or uncertainty in the raw data, the sum of squares of all adjustments needed is minimized. We present a software prototype in which the balancing is integrated with a variety of tools for quality assessment of collected data, and use data from a previously published study of nitrogen flows in Sweden to illustrate the steps involved in the proposed algorithms.