Options
Choose Your Procedure Wisely: Removal Of Outliers Is Inappropriate For Estimating Background Concentrations Of Trace Elements In Soil
Journal
Environmental Toxicology and Chemistry
Date Issued
2022-12-21
Author(s)
Alexander Neaman
Lalita V. Zakharikhina
Patricia Peñaloza
Elvira A. Dovletyarova
WoS ID
WOS:000921655400001
Abstract
Background concentrations of trace elements in soil, that is, the natural abundances of metals and metalloids in soil, have been the subject of much scientific discussion. (The International Union of Pure and Applied Chemistry no longer endorses the term “heavy metal,” so we will not use it. In addition, for the sake of brevity, from now on, the term “metal” will include metalloids (e.g., arsenic). We have attempted to clarify the concept, particularly from the perspective of environmental toxicology and chemistry. In the context of soil contamination assessment, knowledge of background concentrations of metals allows scientists to distinguish between the natural abundances of metals in the environment and the concentrations of the same elements that occur as a result of anthropogenic impacts. Given the controversy surrounding the definition of “background concentration,” we decided to follow the International Organization for Standardization guidance 19258 (2018), which defines “background concentration” as “the concentration of an element or substance characteristic of a soil type in an area or region, arising from both natural sources and anthropogenic diffuse sources such as atmospheric deposition.” The protocol also defines “background value” as a statistical property of the background concentration, such as mean, median, range, or percentile. Elevated concentrations of potentially toxic elements in soils are not always due to anthropogenic sources, but may be related to the natural occurrence of the elements in the environment (McLaughlin & Smolders, 2001). Given that the lithology of the soil parent material determines the background metal concentrations in the soil (Novoselov et al., 2022), background concentrations established in one area are not applicable in areas that differ lithologically. Therefore, Clarke numbers (elemental abundances in the Earth's crust) are of limited use for soil contamination assessment, and useful knowledge of background concentrations of metals in soil must be obtained separately in each watershed (Novoselov et al., 2022). It is known that geochemical data on element concentrations do not follow a normal or lognormal distribution, but rather a skewed distribution with statistical outliers (Reimann & Filzmoser, 2000). Given this fact, we disagree with the procedure in Matschullat et al. (2000), which is widely used to derive background concentrations of elements. In that procedure, statistical outliers are removed from the data set by excluding those values that fall outside the range of two or four standard deviations (hereafter termed 2σ- and 4σ-outlier removal). To demonstrate the shortcomings of that procedure, we used our database of total copper and zinc concentrations in 30 soil samples collected in a pristine area free of anthropogenic influence (Litvinenko & Zakharikhina, 2022). The Kolgorov–Smirnov test showed that the data did not in fact follow the normal distribution, whereas the Grubbs test showed that the data did in fact contain statistical outliers (Table 1), consistent with Reimann and Filzmoser (2000). The presence of outliers in the data set can be explained by the sulfide mineralization shown on the geological maps of the area. After removing 2σ- and 4σ-outliers, the maximum values are significantly lower than in the original data set (Table 1). Following the procedure in Matschullat et al. (2000), this outlier-free data correspond to background values. Let us assume that this outlier-free data are used to determine the background concentrations of metals. Let us also assume that the latter background concentrations were applied to the original data set (with outliers) to determine the occurrence of soil contamination. Taking these assumptions at face value would lead to the erroneous conclusion that some of the soils in the original data set were contaminated by anthropogenic influences. This conclusion would be all the more unreasonable given that the original data came from a pristine area in a national nature reserve where some of the sampling points could only be reached by helicopter. Furthermore, we would like to critically analyze the method of Galuszka (2007), one of the most widely used methods for estimating soil metal contamination in anthropogenically affected areas. Galuszka (2007) eliminates statistical outliers from the data set by iterative 2σ-outlier removal. Under this approach, outlier-free data would reflect background conditions, whereas elevated metal concentrations would signal anthropogenic contamination. However, this procedure is clearly flawed because geochemical data on element concentrations usually contain statistical outliers (Reimann & Filzmoser, 2000), as our own database from an area without anthropogenic activity shows (Table 1). Applying the Galuszka (2007) procedure to our data would therefore lead to the erroneous conclusion that some soils are contaminated with metals due to anthropogenic activity. Moreover, we would like to emphasize that removing outliers from the data has not transformed them into a normal distribution (Table 1). Meanwhile, it is important to note that the procedure for calculating statistical properties (e.g., percentiles) does not require producing a normal distribution of the data or any transformation of the data at all. Background values for metals in soil can be established simply by calculating a given statistical property using the original values of metal concentrations in soil from areas without anthropogenic activities. The difficulty is to find such areas in the upper river valleys of a given study site. Finally, our considerations on background concentrations of metals in soil can be extended to other media and other types of contaminants of interest to environmental toxicology and chemistry. For instance, there are currently no guidelines on how to derive background concentrations at contaminated sediment sites, leading to significant variability, uncertainty, and disagreement on the methodology for deriving representative background concentrations of chemicals of concern at these sites (Geiselbrecht et al., 2019). The argument of the present study aims to help in understanding why removal of outliers is an unsound computational procedure for estimating background concentrations of chemicals in environmental media. Removal of outliers can cause biased and inaccurate results when estimating background concentrations of contaminants. The conceptual flaw of the procedure is that it is based on the false assumption that there are no statistical outliers in the data on contaminant concentrations in the unpolluted environment. A. Neaman and P. Peñaloza wrote the manuscript with the support of the Chilean National Fund for Scientific and Technological Development 1200048 project (awarded to A. Neaman). L.V. Zakharikhina's field and laboratory work was supported by Federal Grant 0492-2021-0016 (awarded to the Subtropical Scientific Center of the Russian Academy of Sciences). Elvira A. Dovletyarova drafted the manuscript with the support of the RUDN University Strategic Academic Leadership Program. The authors thank A. Tchourakov for proofreading the English and for his helpful comments. Alexander Neaman: Conceptualization; Funding acquisition; Methodology; Writing—original draft and review & editing. Lalita V. Zakharikhina: Investigation; Resources; Validation. Claudia Navarro-Villarroel: Data curation; Formal analysis; Resources; Software; Validation. Patricia Peñaloza: Conceptualization; Methodology; Project administration; Validation; Writing—review & editing. Elvira A. Dovletyarova: Conceptualization; Funding acquisition; Methodology; Writing—review & editing. This is a point of reference article. No data were generated.
OCDE Subjects
Quartile (Date Issued)
Q2
License
acceso abierto