skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Beckman Report on Database Research
Every few years a group of database researchers meets to discuss the state of database research, its impact on practice, and important new directions. This report summarizes the discussion and conclusions of the eighth such meeting, held October 14- 15, 2013 in Irvine, California. It observes that Big Data has now become a defining challenge of our time, and that the database research community is uniquely positioned to address it, with enormous opportunities to make transformative impact. To do so, the report recommends significantly more attention to five research areas: scalable big/fast data infrastructures; coping with diversity in the data management landscape; end-to-end processing and understanding of data; cloud services; and managing the diverse roles of people in the data life cycle.  more » « less
Award ID(s):
1730628
PAR ID:
10219486
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; « less
Date Published:
Journal Name:
ACM SIGMOD Record
Volume:
43
Issue:
3
ISSN:
0163-5808
Page Range / eLocation ID:
61 to 70
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Building large software systems is always a chal- lenging venture, but it is especially so in academia. This paper describes the experiences that the author and his (mostly UC- based) partners in software crime have had that culminated in the Big Data Management System now available as Apache AsterixDB. It covers a mix of the history and technical content of the nearly ten-year-old project, starting with its inception during the MapReduce craze. It describes the phases that the effort has gone through and some of the lessons learned along the way. The paper also covers some personal reflections and opinions about the challenges of systems-building, as well as writing about it, in our current academic culture. Included is the case for doing this sort of work at all – discussing the pitfalls of doing “systems” research in the absence of an actual system, and why the gain outweighs the pain of building and sharing database software in academia. As of late 2018, Apache AsterixDB is also having a commercial impact as the storage and parallel query engine underlying a new offering called Couchbase Analytics. The last part of the paper explains how we are attempting to balance the uses of AsterixDB as (i) a generally available open source Apache software platform, (ii) an end-to-end research testbed for universities, and (iii) the technology powering a commercial NoSQL product. 
    more » « less
  2. The American Society of Civil Engineers (ASCE) Report Card for America’s Infrastructure gave bridges a C+ (mediocre) grade in 2017. Approximately, 1 in 5 rural bridges are in critical condition, which presents serious challenges to public safety and economic growth. Fortunately, during a series of workshops on this topic organized by the authors, it has become clear that Big Data could provide a timely solution to these critical problems. In this work in progress paper, we describe a conceptual framework for developing SMart big data pipelines for Aging Rural bridge Transportation Infrastructure (SMARTI). Our framework and associated research questions are organized around four ingredients: • Next-Generation Health Monitoring: Sensors; Unmanned Aerial Vehicle/System (UAV/UAS); wireless networks • Data Management: Data security and quality; intellectual property; standards and shared best practices; curation • Decision Support Systems: Analysis and modeling; data analytics; decision making; visualization, • Socio-Technological Impact: Policy; societal, economic and environmental impact; disaster and crisis management. 
    more » « less
  3. Abstract The big graph database provides strong modeling capabilities and efficient querying for complex applications. Subgraph isomorphism which finds exact matches of a query graph in the database efficiently, is a challenging problem. Current subgraph isomorphism approaches mostly are based on the pruning strategy proposed by Ullmann. These techniques have two significant drawbacks- first, they are unable to efficiently handle complex queries, and second, their implementations need the large indexes that require large memory resources. In this paper, we describe a new subgraph isomorphism approach, the HyGraph algorithm, that is efficient both in querying and with memory requirements for index creation. We compare the HyGraph algorithm with two popular existing approaches, GraphQL and Cypher using complexity measures and experimentally using three big graph data sets—(1) a country-level population database, (2) a simulated bank database, and (3) a publicly available World Cup big graph database. It is shown that the HyGraph solution performs significantly better (or equally) than competing algorithms for the query operations on these big databases, making it an excellent candidate for subgraph isomorphism queries in real scenarios. 
    more » « less
  4. null (Ed.)
    Background Human movement is one of the forces that drive the spatial spread of infectious diseases. To date, reducing and tracking human movement during the COVID-19 pandemic has proven effective in limiting the spread of the virus. Existing methods for monitoring and modeling the spatial spread of infectious diseases rely on various data sources as proxies of human movement, such as airline travel data, mobile phone data, and banknote tracking. However, intrinsic limitations of these data sources prevent us from systematic monitoring and analyses of human movement on different spatial scales (from local to global). Objective Big data from social media such as geotagged tweets have been widely used in human mobility studies, yet more research is needed to validate the capabilities and limitations of using such data for studying human movement at different geographic scales (eg, from local to global) in the context of global infectious disease transmission. This study aims to develop a novel data-driven public health approach using big data from Twitter coupled with other human mobility data sources and artificial intelligence to monitor and analyze human movement at different spatial scales (from global to regional to local). Methods We will first develop a database with optimized spatiotemporal indexing to store and manage the multisource data sets collected in this project. This database will be connected to our in-house Hadoop computing cluster for efficient big data computing and analytics. We will then develop innovative data models, predictive models, and computing algorithms to effectively extract and analyze human movement patterns using geotagged big data from Twitter and other human mobility data sources, with the goal of enhancing situational awareness and risk prediction in public health emergency response and disease surveillance systems. Results This project was funded as of May 2020. We have started the data collection, processing, and analysis for the project. Conclusions Research findings can help government officials, public health managers, emergency responders, and researchers answer critical questions during the pandemic regarding the current and future infectious risk of a state, county, or community and the effectiveness of social/physical distancing practices in curtailing the spread of the virus. International Registered Report Identifier (IRRID) DERR1-10.2196/24432 
    more » « less
  5. Summary Traditional relational database systems handle data by dividing their memory into sections such as a buffer cache and working memory, assigning a memory budget to each section to efficiently manage a limited amount of overall memory. They also assign memory budgets to memory‐intensive operators such as sorts and joins and control the allocation of memory to these operators; each memory‐intensive operator attempts to maximize its memory usage to reduce disk I/O cost. Implementing such memory‐intensive operators requires a careful design and application of appropriate algorithms that properly utilize memory. Today's Big Data management systems need the ability to handle large amounts of data similarly, as it is unrealistic to assume that truly big data will fit into memory. In this article, we share our memory management experiences in Apache AsterixDB, an open‐source Big Data management software platform that scales out horizontally on shared‐nothing commodity computing clusters. We describe the implementation of AsterixDB's memory‐intensive operators and their designs related to memory management. We also discuss memory management at the global (cluster) level. We conducted an experimental study using several synthetic and real datasets to explore the impact of this work. We believe that future Big Data management system builders can benefit from these experiences. 
    more » « less