In recent years, the availability of airborne imaging spectroscopy (hyperspectral) data has expanded dramatically. The high spatial and spectral resolution of these data uniquely enable spatially explicit ecological studies including species mapping, assessment of drought mortality and foliar trait distributions. However, we have barely begun to unlock the potential of these data to use direct mapping of vegetation characteristics to infer subsurface properties of the critical zone. To assess their utility for Earth systems research, imaging spectroscopy data acquisitions require integration with large, coincident ground‐based datasets collected by experts in ecology and environmental and Earth science. Without coordinated, well‐planned field campaigns, potential knowledge leveraged from advanced airborne data collections could be lost. Despite the growing importance of this field, documented methods to couple such a wide variety of disciplines remain sparse. We coordinated the first National Ecological Observatory Network Airborne Observation Platform (AOP) survey performed outside of their core sites, which took place in the Upper East River watershed, Colorado. Extensive planning for sample tracking and organization allowed field and flight teams to update the ground‐based sampling strategy daily. This enabled collection of an extensive set of physical samples to support a wide range of ecological, microbiological, biogeochemical and hydrologicalmore » We present a framework for integrating airborne and field campaigns to obtain high‐quality data for foliar trait prediction and document an archive of coincident physical samples collected to support a systems approach to ecological research in the critical zone. This detailed methodological account provides an example of how a multi‐disciplinary and multi‐institutional team can coordinate to maximize knowledge gained from an airborne survey, an approach that could be extended to other studies. The coordination of imaging spectroscopy surveys with appropriately timed and extensive field surveys, along with high‐quality processing of these data, presents a unique opportunity to reveal new insights into the structure and dynamics of the critical zone. To our knowledge, this level of co‐aligned sampling has never been undertaken in tandem with AOP surveys and subsequent studies utilizing this archive will shed considerable light on the breadth of applications for which imaging spectroscopy data can be leveraged.
The science of science has attracted growing research interests, partly due to the increasing availability of large-scale datasets capturing the innerworkings of science. These datasets, and the numerous linkages among them, enable researchers to ask a range of fascinating questions about how science works and where innovation occurs. Yet as datasets grow, it becomes increasingly difficult to track available sources and linkages across datasets. Here we present SciSciNet, a large-scale open data lake for the science of science research, covering over 134M scientific publications and millions of external linkages to funding and public uses. We offer detailed documentation of pre-processing steps and analytical choices in constructing the data lake. We further supplement the data lake by computing frequently used measures in the literature, illustrating how researchers may contribute collectively to enriching the data lake. Overall, this data lake serves as an initial but useful resource for the field, by lowering the barrier to entry, reducing duplication of efforts in data processing and measurements, improving the robustness and replicability of empirical claims, and broadening the diversity and representation of ideas in the field.
- Publication Date:
- NSF-PAR ID:
- 10418049
- Journal Name:
- Scientific Data
- Volume:
- 10
- Issue:
- 1
- ISSN:
- 2052-4463
- Publisher:
- Nature Publishing Group
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract For wildlife inhabiting snowy environments, snow properties such as onset date, depth, strength, and distribution can influence many aspects of ecology, including movement, community dynamics, energy expenditure, and forage accessibility. As a result, snow plays a considerable role in individual fitness and ultimately population dynamics, and its evaluation is, therefore, important for comprehensive understanding of ecosystem processes in regions experiencing snow. Such understanding, and particularly study of how wildlife–snow relationships may be changing, grows more urgent as winter processes become less predictable and often more extreme under global climate change. However, studying and monitoring wildlife–snow relationships continue to be challenging because characterizing snow, an inherently complex and constantly changing environmental feature, and identifying, accessing, and applying relevant snow information at appropriate spatial and temporal scales, often require a detailed understanding of physical snow science and technologies that typically lie outside the expertise of wildlife researchers and managers. We argue that thoroughly assessing the role of snow in wildlife ecology requires substantive collaboration between researchers with expertise in each of these two fields, leveraging the discipline‐specific knowledge brought by both wildlife and snow professionals. To facilitate this collaboration and encourage more effective exploration of wildlife–snow questions, we provide a five‐stepmore »
-
Summary The explosion of IoT devices and sensors in recent years has led to a demand for efficiently storing, processing and analyzing time‐series data. Geoscience researchers use time‐series data stores such as Hydroserver, Virtual Observatory and Ecological Informatics System (VOEIS), and Cloud‐Hosted Real‐time Data Service (CHORDS). Many of these tools require a great deal of infrastructure to deploy and expertise to manage and scale. The Tapis framework, an NSF funded project, provides science as a service APIs to allow researchers to achieve faster scientific results, by eliminating the need to set up a complex infrastructure stack. The University of Hawai'i (UH) and Texas Advanced Computing Center (TACC) have collaborated to develop an open source Tapis Streams API that builds on the concepts of the CHORDS time series data service to support research. This new hosted service allows storing, processing, annotating, archiving, and querying time‐series data in the Tapis multi‐user and multi‐tenant collaborative platform. The Streams API provides a hosted production level middleware service that enables new data‐driven event workflows capabilities that may be leveraged by researchers and Tapis powered science gateways for handling spatially indexed time‐series datasets.
-
Summary We are in the midst of a scientific data explosion in which the rate of data growth is rapidly increasing. While large‐scale research projects have developed sophisticated data distribution networks to share their data with researchers globally, there is no such support for the many millions of research projects generating data of interest to much smaller audiences (as exemplified by the long tail scientist). In data‐oriented research, every aspect of the research process is influenced by data access. However, sharing and accessing data efficiently as well as lowering access barriers are difficult. In the absence of dedicated large‐scale storage, many have noted that there is an enormous storage capacity available via connected peers, none more so than the storage resources of many research groups. With widespread usage of the content delivery network model for disseminating web content, we believe a similar model can be applied to distributing, sharing, and accessing long tail research data in an e‐Science context. We describe the vision and architecture of a social content delivery network – a model that leverages the social networks of researchers to automatically share and replicate data on peers' resources based upon shared interests and trust. Using this model, wemore »
-
Abstract Machine learning (ML) provides a powerful framework for the analysis of high‐dimensional datasets by modelling complex relationships, often encountered in modern data with many variables, cases and potentially non‐linear effects. The impact of ML methods on research and practical applications in the educational sciences is still limited, but continuously grows, as larger and more complex datasets become available through massive open online courses (MOOCs) and large‐scale investigations. The educational sciences are at a crucial pivot point, because of the anticipated impact ML methods hold for the field. To provide educational researchers with an elaborate introduction to the topic, we provide an instructional summary of the opportunities and challenges of ML for the educational sciences, show how a look at related disciplines can help learning from their experiences, and argue for a philosophical shift in model evaluation. We demonstrate how the overall quality of data analysis in educational research can benefit from these methods and show how ML can play a decisive role in the validation of empirical models. Specifically, we (1) provide an overview of the types of data suitable for ML and (2) give practical advice for the application of ML methods. In each section, we provide analyticalmore »
Context and implications Rationale for this study In 2020, the worldwide SARS‐COV‐2 pandemic forced the educational sciences to perform a rapid paradigm shift with classrooms going online around the world—a hardly novel but now strongly catalysed development. In the context of data‐driven education, this paper demonstrates that the widespread adoption of machine learning techniques is central for the educational sciences and shows how these methods will become crucial tools in the collection and analysis of data and in concrete educational applications. Helping to leverage the opportunities and to avoid the common pitfalls of machine learning, this paper provides educators with the theoretical, conceptual and practical essentials.
Why the new findings matter The process of teaching and learning is complex, multifaceted and dynamic. This paper contributes a seminal resource to highlight the digitisation of the educational sciences by demonstrating how new machine learning methods can be effectively and reliably used in research, education and practical application.
Implications for educational researchers and policy makers The progressing digitisation of societies around the globe and the impact of the SARS‐COV‐2 pandemic have highlighted the vulnerabilities and shortcomings of educational systems. These developments have shown the necessity to provide effective educational processes that can support sometimes overwhelmed teachers to digitally impart knowledge on the plan of many governments and policy makers. Educational scientists, corporate partners and stakeholders can make use of machine learning techniques to develop advanced, scalable educational processes that account for individual needs of learners and that can complement and support existing learning infrastructure. The proper use of machine learning methods can contribute essential applications to the educational sciences, such as (semi‐)automated assessments, algorithmic‐grading, personalised feedback and adaptive learning approaches. However, these promises are strongly tied to an at least basic understanding of the concepts of machine learning and a degree of data literacy, which has to become the standard in education and the educational sciences.
Demonstrating both the promises and the challenges that are inherent to the collection and the analysis of large educational data with machine learning, this paper covers the essential topics that their application requires and provides easy‐to‐follow resources and code to facilitate the process of adoption.