Summary Traditional relational database systems handle data by dividing their memory into sections such as a buffer cache and working memory, assigning a memory budget to each section to efficiently manage a limited amount of overall memory. They also assign memory budgets to memory‐intensive operators such as sorts and joins and control the allocation of memory to these operators; each memory‐intensive operator attempts to maximize its memory usage to reduce disk I/O cost. Implementing such memory‐intensive operators requires a careful design and application of appropriate algorithms that properly utilize memory. Today's Big Data management systems need the ability to handle large amounts of data similarly, as it is unrealistic to assume that truly big data will fit into memory. In this article, we share our memory management experiences in Apache AsterixDB, an open‐source Big Data management software platform that scales out horizontally on shared‐nothing commodity computing clusters. We describe the implementation of AsterixDB's memory‐intensive operators and their designs related to memory management. We also discuss memory management at the global (cluster) level. We conducted an experimental study using several synthetic and real datasets to explore the impact of this work. We believe that future Big Data management system builders can benefit from these experiences.
more »
« less
Developing collection management tools to create more robust and reliable linguistic data
Lack of adequate descriptive metadata remains a major barrier to accessing and reusing language documentation. A collection management tool could facilitate management of linguistic data from the point of creation to the archive deposit, greatly reducing the archiving backlog and ensuring more robust and reliable data.
more »
« less
- Award ID(s):
- 1648984
- PAR ID:
- 10025963
- Date Published:
- Journal Name:
- Workshop on Computational Methods for Endangered Languages
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Automated experimentation methods are unlocking a new data-rich research paradigm in materials science that promises to accelerate the pace of materials discovery. However, if our data management practices do not keep pace with progress in automation, this revolution threatens to drown us in unusable data. In this perspective, we highlight the need to update data management practices to track, organize, process, and share data collected from laboratories with deeply integrated automation equipment. We argue that a holistic approach to data management that integrates multiple scales (experiment, group and community scales) is needed. We propose a vision for what this integrated data future could look like and compare existing work against this vision to find gaps in currently available data management tools. To realize this vision, we believe that development of standard protocols for communicating with equipment and data sharing, the development of new open-source software tools for managing data in research groups, and leadership and direction from funding agencies and other organizations are needed.more » « less
-
Institutions have made significant investments to support public access to research data requirements; yet have little comparative data about these services, infrastructure, and costs. To address this need, the research team undertook a mixed-methods approach to understand the institution-wide expenses for research data management and sharing and began to draft an expense model for data management and sharing. This model is further useful for institutions that provide research data management and sharing.more » « less
-
The potential of smart cities in remediating environmental problems in general and waste management, in particular, is an important question that needs to be investigated in academic research. Built on an integrative review of the literature, this study offers insights into the potential of smart cities and connected communities in facilitating waste management efforts. Shortcomings of existing waste management practices are highlighted and a conceptual framework for a centralized waste management system is proposed, where three interconnected elements are discussed: (1) an infrastructure for proper collection of product lifecycle data to facilitate full visibility throughout the entire lifespan of a product, (2) a set of new business models relied on product lifecycle data to prevent waste generation, and (3) an intelligent sensor-based infrastructure for proper upstream waste separation and on-time collection. The proposed framework highlights the value of product lifecycle data in reducing waste and enhancing waste recovery and the need for connecting waste management practices to the whole product lifecycle. An example of the use of tracking and data sharing technologies for investigating the waste management issues has been discussed. Finally, the success factors for implementing the proposed framework and some thoughts on future research directions have been discussed.more » « less
-
The COVID-19 pandemic has resulted in more than 440 million confirmed cases globally and almost 6 million reported deaths as of March 2022. Consequently, the world experienced grave repercussions to citizens’ lives, health, wellness, and the economy. In responding to such a disastrous global event, countermeasures are often implemented to slow down and limit the virus’s rapid spread. Meanwhile, disaster recovery, mitigation, and preparation measures have been taken to manage the impacts and losses of the ongoing and future pandemics. Data-driven techniques have been successfully applied to many domains and critical applications in recent years. Due to the highly interdisciplinary nature of pandemic management, researchers have proposed and developed data-driven techniques across various domains. However, a systematic and comprehensive survey of data-driven techniques for pandemic management is still missing. In this article, we review existing data analysis and visualization techniques and their applications for COVID-19 and future pandemic management with respect to four phases (namely, Response, Recovery, Mitigation, and Preparation) in disaster management. Data sources utilized in these studies and specific data acquisition and integration techniques for COVID-19 are also summarized. Furthermore, open issues and future directions for data-driven pandemic management are discussed.more » « less
An official website of the United States government

