- NSF-PAR ID:
- Date Published:
- Journal Name:
- Frontiers in Ecology and Evolution
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
Background Understanding how study design and monitoring strategies shape inference within, and synthesis across, studies is critical across biological disciplines. Many biological and field studies are short term and limited in scope. Monitoring studies are critical for informing public health about potential vectors of concern, such as Ixodes scapularis (black-legged ticks). Black-legged ticks are a taxon of ecological and human health concern due to their status as primary vectors of Borrelia burgdorferi , the bacteria that transmits Lyme disease. However, variation in black-legged tick monitoring, and gaps in data, are currently considered major barriers to understanding population trends and in turn, predicting Lyme disease risk. To understand how variable methodology in black-legged tick studies may influence which population patterns researchers find, we conducted a data synthesis experiment. Materials and Methods We searched for publicly available black-legged tick abundance dataset that had at least 9 years of data, using keywords about ticks in internet search engines, literature databases, data repositories and public health websites. Our analysis included 289 datasets from seven surveys from locations in the US, ranging in length from 9 to 24 years. We used a moving window analysis, a non-random resampling approach, to investigate the temporal stability of black-legged tick population trajectories across the US. We then used t-tests to assess differences in stability time across different study parameters. Results All of our sampled datasets required 4 or more years to reach stability. We also found several study factors can have an impact on the likelihood of a study reaching stability and of data leading to misleading results if the study does not reach stability. Specifically, datasets collected via dragging reached stability significantly faster than data collected via opportunistic sampling. Datasets that sampled larva reached stability significantly later than those that sampled adults or nymphs. Additionally, datasets collected at the broadest spatial scale (county) reached stability fastest. Conclusion We used 289 datasets from seven long term black-legged tick studies to conduct a non-random data resampling experiment, revealing that sampling design does shape inferences in black-legged tick population trajectories and how many years it takes to find stable patterns. Specifically, our results show the importance of study length, sampling technique, life stage, and geographic scope in understanding black-legged tick populations, in the absence of standardized surveillance methods. Current public health efforts based on existing black-legged tick datasets must take monitoring study parameters into account, to better understand if and how to use monitoring data to inform decisioning. We also advocate that potential future forecasting initiatives consider these parameters when projecting future black-legged tick population trends.more » « less
Growth of macroscale limnological research has been accompanied by an increase in secondary datasets compiled from multiple sources. We examined patterns of data availability in LAGOS‐NE, a dataset derived from 87 sources, to identify biases in availability of lake water quality data and to consider how such biases might affect perceived patterns at a subcontinental scale. Of eight common water quality parameters, variables indicative of trophic state (Secchi, chlorophyll, and total P) were most abundant in terms of total observations, lakes sampled, and long‐term records, whereas carbon variables (true color and dissolved organic carbon) were scarcest. Most data were collected during summer from larger (≥ 20 ha) lakes over 1–3 yr. Approximately 80% of data for each variable is derived from ~ 20% of sampled lakes. Long‐term (≥ 20 yr) records were rare and spatially clustered. Data availability is linked to major management challenges (eutrophication and acid rain), citizen science, and a few programs that quantify C and N variables. Resampling exercises suggested that correcting for the surface area sampling bias did not substantially change statistical distributions of the eight variables. Further, estimating a lake's long‐term median Secchi, chlorophyll, and total P using average record lengths had high uncertainty, but modest increases in sample size to > 5 yr yielded estimates with manageable error. Although the specific nature of sampling biases may vary among regions, we expect that they are widespread. Thus, large integrated datasets can and should be used to identify tendencies in how lakes are studied and to address these biases as part broad‐scale limnological investigations.
Land degradation is a leading cause of biodiversity loss, and understanding its consequences on freshwater ecosystems remains a priority for improving the effectiveness of restoration practices and ecosystem assessments. Freshwater monitoring programs use macroinvertebrates to assess the biotic effects of degradation and management actions, often using the ratio of observed to expected taxa at a site—O/E—for this purpose. Despite the power of the O/E approach, large amounts of data are required to generate an expectation and it can be difficult to define a threshold value for degraded sites. An alternative assessment tool is phylogenetic diversity, which is widely used in academic biology but rarely applied in management despite empirical correlations between phylogenetic diversity and management targets such as ecosystem structure and function. Here, we use macroinvertebrate data from 1400 watersheds, collected since 1998, to evaluate the potential for phylogenetic metrics to inform evaluations of management practices. These watersheds were chosen because their low disturbance levels and high habitat heterogeneity have made them problematic to assess with O/E. Phylogenetic diversity detected degradation of assemblages and was sensitive enough to parse impacts to inform management actions. This is particularly notable given the phylogenetic metrics, unlike O/E, did not require additional “baseline” data. Site disturbance and broader environmental drivers strongly predicted site phylogenetic structure, providing management objectives to increase site quality. We call on others to consider using phylogenetic diversity to complement existing O/E schemes, particularly in systems where O/E is insufficient to prioritize management objectives.
Bipolar Disorder, a mood disorder with recurrent mania and depression, requires ongoing monitoring and specialty management. Current monitoring strategies are clinically-based, engaging highly specialized medical professionals who are becoming increasingly scarce. Automatic speech-based monitoring via smartphones has the potential to augment clinical monitoring by providing inexpensive and unobtrusive measurements of a patient’s daily life. The success of such an approach is contingent on the ability to successfully utilize “in-the-wild” data. However, most existing work on automatic mood detection uses datasets collected in clinical or laboratory settings. This study presents experiments in automatically detecting depression severity in individuals with Bipolar Disorder using data derived from clinical interviews and from personal conversations. We find that mood assessment is more accurate using data collected from clinical interactions, in part because of their highly structured nature. We demonstrate that although the features that are most effective in clinical interactions do not extend well to personal conversational data, we can identify alternative features relevant in personal conversational speech to detect mood symptom severity. Our results highlight the challenges unique to working with “in-the-wild” data, providing insight into the degree to which the predictive ability of speech features is preserved outside of a clinical interview.more » « less
Few studies have systematically investigated the effects of subsetting strategies on soil modelling or explored the potential of emergent methods from other fields not previously applied to pedometrics. This study considers smallholder agricultural villages in southern India that have been understudied in terms of chemometric modelling intended to support soil health, fertility and management. Therefore, the objective was to investigate the application of visible near‐infrared spectroscopy and chemometrics to predict soil properties in this setting. In addition, this study evaluated the effects of methods of calibration subsetting and new parametric models on the prediction of soil properties. These novel methods were transferred from the genomics field to soil science. Three strategic subsetting methods were used to produce calibration subsets that consider the variation in the soil properties, the spectra and both together; this is in addition to standard random calibration subsetting. Partial least squares regression (PLSR) and two methods from genomics that impose variable reduction were used for modelling; the latter were sparse PLSR (SPLSR) and the heteroscedastic effects model (HEM). Soil samples were collected from two villages and analysed for texture, soil carbon and available macro‐ and micro‐nutrients. The results showed that soil texture and carbon could be predicted moderately to strongly, whereas plant nutrient properties were predicted poorly to moderately. Random subsetting and subsetting by property distribution were more appropriate when spectra varied less overall, whereas subsetting that incorporates variation in spectra and properties improved results when spectral variation increased. The SPLSR and HEM models improved results over PLSR in some cases, or at least maintained prediction strength while using fewer predictors. Subsetting methods improved prediction results in 75% of cases. This study filled an important research gap by systematically studying local subsetting behaviour under different degrees of spectral and attribute variation.
Explored new calibration subsetting methods and chemometric models in soil spectral modelling.
Compared the methods and models for 17 soil properties in an understudied area of India.
Random subsetting was not always optimal; subsetting matters and depends on data characteristics.
Sparse models from genomics performed better in 75% of cases than a standard method.