Environmental context: Having appropriate and robust models used for developing water quality guidelines is critical for sound environmental management. Methods used to validate models have only been demonstrated appropriate for a small portion of data types used in these models. This study has found that models using certain data types would be more appropriately validated using alternative evaluation criteria. This study serves as an important reference for developing and evaluating robust models. Rationale: Bioavailability-based toxicity models for metals often have performance assessed by whether it can predict toxicity data within a factor of 2 of their paired observed toxicity data. This method has only been verified for median effect values (EC50) for acute fish and daphnia data, however toxicity models have been developed for a much broader range of effect levels (i.e. EC10/EC20) and species (e.g. microalga). This study tested whether the factor-of-2 rule is appropriate for a wider range of organisms and effect concentrations than previously studied. Methodology: Toxicity estimate data from repeated tests conducted under the same conditions were collated to assess variation in results and compare this variation to a range of 4 (a factor of 2 above and below the mean) and a range of 9 (a factor of 3 above and below the mean) to assess if a factor-of-3 rule may be more appropriate for some species and effect levels. Results and discussion: Overall, the factor-of-2 rule is broadly applicable for metal toxicity to a range of species for EC50 data. The EC10 datasets highlighted that larger variability exists in low effect levels and supported the use of a factor-of-3 rule, while the either the factor-of-2 or factor-of-3 rule could be applied to microalgae. The level of performance evaluation chosen may depend on the application of the bioavailability model. This study also found that while repeated toxicity test data is routinely generated, it is rarely published. Publication of such data would enable expansion of the present study to include inter-laboratory comparisons, an important consideration as most bioavailability models are based on data pooled from multiple sources.
Funding
Gwilym Price was supported by an Australian Government Research Training Program Scholarship.