Abstract The year 2022 marks the ten‐year anniversary of the White House's Big Data Research and Development Initiative. While this initiative, and the others it spawned, helped to advance the many facets of data intensive research and discovery, obstacles and challenges still exist. If left unaddressed these obstacles will persist and at a minimum limit the potential of what can be achieved by harnessing the many new ways to collect, analyze, and share data and the insights that can be drawn from them. The opportunities and challenges related to Big Data in agriculture touch on all aspects of the general research data lifecycle; from instruments used to gather data, to advanced digital platforms used to store, analyze, and share data, and the innovative insights from using advanced computational methods. The eight papers included in this special issue were chosen in part because they highlight both the challenges and the opportunities that come from all stages of the data lifecycle common across agricultural research and development. These papers grew out of several workshops made possible by the support of the Midwest Regional Big Data Hub, which is sponsored by the National Science Foundation.
more »
« less
Data integration and predictive modeling methods for multi-omics datasets
Translating data to knowledge and actionable insights is the Holy Grail for many scientific fields, including biology. The unprecedented massive and heterogeneous data have created as many challenges to store, process and analyze as the opportunities and promises they hold. Here, we provide an overview of these opportunities and challenges in multi-omics predictive analytics.
more »
« less
- PAR ID:
- 10091778
- Date Published:
- Journal Name:
- Molecular Omics
- Volume:
- 14
- Issue:
- 1
- ISSN:
- 2515-4184
- Page Range / eLocation ID:
- 8 to 25
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Data is becoming increasingly personal. Individuals regularly interact with a wide variety of structured data, from SQLite databases on phones, to HR spreadsheets, to personal sensors, to open government data appearing in news articles. Although these workloads are important, many of the classical challenges associated with scale and Big Data do not apply. This panel brings together experts in a variety of fields to explore the new opportunities and challenges presented by "Small Data".more » « less
-
Darmont, J; Novikov, B.; Wrembel, R. (Ed.)Bitcoin [12] is a successful and interesting example of a global scale peer-to-peer cryptocurrency that integrates many techniques and protocols from cryptography, distributed systems, and databases. The main underlying data structure is blockchain, a scalable fully replicated structure that is shared among all participants and guarantees a consistent view of all user transactions by all participants in the system. In a blockchain, nodes agree on their shared states across a large network of untrusted participants. Although originally devised for cryptocurrencies, recent systems exploit its many unique features such as transparency, provenance, fault tolerance, and authenticity to support a wide range of distributed applications. Bitcoin and other cryptocurrencies use permissionless blockchains. In a permissionless blockchain, the network is public, and anyone can participate without a specific identity. Many other distributed applications, such as supply chain management and healthcare, are deployed on permissioned blockchains consisting of a set of known, identified nodes that still might not fully trust each other. This paper illustrates some of the main challenges and opportunities from a database perspective in the many novel and interesting application domains of blockchains. These opportunities are illustrated using various examples from recent research in both permissionless and permissioned blockchains. Two main themes unite the various examples: (1) the important role of distribution and consensus in managing large scale systems and (2) the need to tolerate malicious failures. The advent of cloud computing and large data centers shifted large scale data management infrastructures from centralized databases to distributed systems. One of the main challenges in designing distributed systems is the need for fault-tolerance. Cloud-based systems typically assume trusted infrastructures, since data centers are owned by the enterprises managing the data, and hence the design typically only assumes and tolerates crash failures. The advent of blockchain and the underlying premise that copies of the blockchain are distributed among untrusted entities has shifted the focus of fault-tolerance from tolerating crash failures to tolerating malicious failures. These interesting and challenging settings pose great opportunities for database researchers.more » « less
-
Engineering is fundamentally about design, yet many undergraduate programs offer limited opportunities for students to learn to design. This design case reports on a grant-funded effort to revolutionize how chemical engineering is taught. Prior to this effort, our chemical engineering program was like many, offering core courses primarily taught through lectures and problem sets. While some faculty referenced examples, students had few opportunities to construct and apply what they were learning. Spearheaded by a team that included the department chair, a learning scientist, a teaching-intensive faculty member, and faculty heavily engaged with the undergraduate program, we developed and implemented design challenges in core chemical engineering courses. We began by co-designing with students and faculty, initially focusing on the first two chemical engineering courses students take. We then developed templates and strategies that supported other faculty-student teams to expand the approach into more courses. Across seven years of data collection and iterative refinements, we developed a framework that offers guidance as we continue to support new faculty in threading design challenges through core content-focused courses. We share insights from our process that supported us in navigating through challenging questions and concerns.more » « less
-
In January and April 2021 we held the Workshop on Overcoming Measurement Barriers to Internet Research (WOMBIR) with the goal of understanding challenges in network and security data set collection and sharing. Most workshop attendees provided white papers describing their perspectives, and many participated in short-talks and discussion in two virtual workshops over five days. That discussion produced consensus around several points. First, many aspects of the Internet are characterized by decreasing visibility of important network properties, which is in tension with the Internet's role as critical infrastructure. We discussed three specific research areas that illustrate this tension: security, Internet access; and mobile networking. We discussed visibility challenges at all layers of the networking stack, and the challenge of gathering data and validating inferences. Important data sets require longitudinal (long-term, ongoing) data collection and sharing, support for which is more challenging for Internet research than other fields. We discussed why a combination of technical and policy methods are necessary to safeguard privacy when using or sharing measurement data. Workshop participant proposed several opportunities to accelerate progress, some of which require coordination across government, industry, and academia.more » « less
An official website of the United States government

