skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: From Policy Sources to Research Information Management Sources: Incorporating Multiple Bibliographic Data Sources into Dashboard Analytics
As the availability of bibliographic data becomes more diverse and unique, so do the complexities of those sources. As a result, it can be challenging and daunting to analyze such data, especially from a single research analytics service request. In this session, the presenter will demonstrate examples of building dashboards to analyze data for research outputs of a college using data from a research information management system, a policy database, and a bibliographic database. Attendees will learn about the advantages of using RIM systems to perform analytics as well as their shortcomings; they’ll also learn how to export data from more traditional systems, such as policy and bibliographic databases, and import them into other systems for dashboard building and analytics.  more » « less
Award ID(s):
2324388
PAR ID:
10566926
Author(s) / Creator(s):
Publisher / Repository:
University of Kentucky Libraries
Date Published:
Subject(s) / Keyword(s):
FOS: Computer and information sciences
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Background Human movement is one of the forces that drive the spatial spread of infectious diseases. To date, reducing and tracking human movement during the COVID-19 pandemic has proven effective in limiting the spread of the virus. Existing methods for monitoring and modeling the spatial spread of infectious diseases rely on various data sources as proxies of human movement, such as airline travel data, mobile phone data, and banknote tracking. However, intrinsic limitations of these data sources prevent us from systematic monitoring and analyses of human movement on different spatial scales (from local to global). Objective Big data from social media such as geotagged tweets have been widely used in human mobility studies, yet more research is needed to validate the capabilities and limitations of using such data for studying human movement at different geographic scales (eg, from local to global) in the context of global infectious disease transmission. This study aims to develop a novel data-driven public health approach using big data from Twitter coupled with other human mobility data sources and artificial intelligence to monitor and analyze human movement at different spatial scales (from global to regional to local). Methods We will first develop a database with optimized spatiotemporal indexing to store and manage the multisource data sets collected in this project. This database will be connected to our in-house Hadoop computing cluster for efficient big data computing and analytics. We will then develop innovative data models, predictive models, and computing algorithms to effectively extract and analyze human movement patterns using geotagged big data from Twitter and other human mobility data sources, with the goal of enhancing situational awareness and risk prediction in public health emergency response and disease surveillance systems. Results This project was funded as of May 2020. We have started the data collection, processing, and analysis for the project. Conclusions Research findings can help government officials, public health managers, emergency responders, and researchers answer critical questions during the pandemic regarding the current and future infectious risk of a state, county, or community and the effectiveness of social/physical distancing practices in curtailing the spread of the virus. International Registered Report Identifier (IRRID) DERR1-10.2196/24432 
    more » « less
  2. Recent advances in Graph Neural Networks (GNNs) have changed the landscape of modern graph analytics. The complexity of GNN training and the scalability challenges have also sparked interest from the systems community, with efforts to build systems that provide higher efficiency and schemes to reduce costs. However, we observe that many such systems basically reinvent the wheel of much work done in the database world on scalable graph analytics engines. Further, they often tightly couple the scalability treatments of graph data processing with that of GNN training, resulting in entangled complex problems and systems that often do not scale well on one of those axes. In this paper, we ask a fundamental question: How far can we push existing systems for scalable graph analytics and deep learning (DL) instead of building custom GNN systems? Are compromises inevitable on scalability and/or runtimes? We propose Lotan, the first scalable and optimized data system for full-batch GNN training withdecoupled scalingthat bridges the hitherto siloed worlds of graph analytics systems and DL systems. Lotan offers a series of technical innovations, including re-imagining GNN training as query plan-like dataflows, execution plan rewriting, optimized data movement between systems, a GNN-centric graph partitioning scheme, and the first known GNN model batching scheme. We prototyped Lotan on top of GraphX and PyTorch. An empirical evaluation using several real-world benchmark GNN workloads reveals a promising nuanced picture: Lotan significantly surpasses the scalability of state-of-the-art custom GNN systems, while often matching or being only slightly behind on time-to-accuracy metrics in some cases. We also show the impact of our system optimizations. Overall, our work shows that the GNN world can indeed benefit from building on top of scalable graph analytics engines. Lotan's new level of scalability can also empower new ML-oriented research on ever-larger graphs and GNNs. 
    more » « less
  3. null (Ed.)
    Structured Query Language (SQL), the standard language for relational database management systems, is an essential skill for software developers, data scientists, and professionals who need to interact with databases. SQL is highly structured and presents diverse ways for learners to acquire this skill. However, despite the significance of SQL to other related fields, little research has been done to understand how students learn SQL as they work on homework assignments. In this paper, we analyze students' SQL submissions to homework problems of the Database Systems course offered at the University of Illinois at Urbana-Champaign. For each student, we compute the Levenshtein Edit Distances between every submission and their final submission to understand how students reached their final solution and how they overcame any obstacles in their learning process. Our system visualizes the edit distances between students' submissions to a SQL problem, enabling instructors to identify interesting learning patterns and approaches. These findings will help instructors target their instruction in difficult SQL areas for the future and help students learn SQL more effectively. 
    more » « less
  4. This Grant for Rapid Response Research (RAPID) project will collect and analyze perishable data on historical buildings. The Tumwata Village (formerly known as Blue Heron Paper Mill Site) located by the Willamette Falls in Oregon City, Oregon, has a very intriguing history and was recently purchased by the Confederated Tribes of Grand Ronde with the intent to restore the falls to their natural state and preserve some of the oldest structures. The site presents a unique opportunity to perform rapid investigations to collect and analyze perishable data on these historical buildings and develop new knowledge in the area of building assessments in corrosive environments. This industrial site contains a wide range of structure types (steel frames, concrete frames, timber frames, masonry walls and massive concrete walls) that were built over a period of 150 years and that employ many construction details that are common in older structures. The data collected and the results of the research will be applicable to many buildings in coastal communities throughout the country. Lidar data sets collected from these buildings will support the development of new methods to analyze and synthesize large data sets as well as integrate visual observations and material testing to quantify structural deterioration damages. The challenge in developing artificial intelligence (AI) technologies to find and quantify damage in structural systems using lidar data is the need to train the methods on existing data sets that show a wide range of damage states. The data to be collected from this site will provide an extensive training data set relevant to structural components common to older buildings. Development of such AI technologies for fast identification and quantification of damage would be transformative for the natural hazards research community and would expand the ability to learn from archived lidar datasets. The collected dataset will be available to researchers to serve as high quality training data in algorithm development. 
    more » « less
  5. Sustainable provision of food, energy and clean water requires understanding of the interdependencies among systems as well as the motivations and incentives of farmers and rural policy makers. Agriculture lies at the heart of interactions among food, energy and water systems. It is an increasingly energy intensive enterprise, but is also a growing source of energy. Agriculture places large demands on water supplies while poor practices can degrade water quality. Each of these interactions creates opportunities for modeling driven by sensor-based and qualitative data collection to improve the effectiveness of system operation and control in the short term as well as investments and planning for the long term. The large volume and complexity of the data collected creates challenges for decision support and stakeholder communication. The DataFEWSion National Research Traineeship program aims to build a community of researchers that explores, develops and implements effective data-driven decision-making to efficiently produce food, transform primary energy sources into energy carriers, and enhance water quality. The initial cohort includes PhD students in agricultural and biosystems, chemical, and industrial engineering as well as statistics and crop production and physiology. The project aims to prepare trainees for multiple career paths such as research scientist, bioeconomy entrepreneur, agribusiness leader, policy maker, agriculture analytics specialist, and professor. The traineeship has four key components. First, trainees will complete a new graduate certificate to build competencies in fundamental understanding of interactions among food production, water quality and bioenergy; data acquisition, visualization, and analytics; complex systems modeling for decision support; and the economics, policy and sociology of the FEW nexus. Second, they will conduct interdisciplinary research on (a) technologies and practices to increase agriculture’s contributions to energy supply while reducing its negative impacts on water quality and human health; (b) data science to increase crop productivity within the constraints of sustainable intensification; or (c) decision sciences to manage tradeoffs and promote best practices among diverse stakeholders. Third, they will participate in a new graduate learning community to consist of a two-year series of workshops that focus in alternate years on the context of the Midwest agricultural FEW nexus and professional development; and fourth, they will have small-group experiences to promote collaboration and peer review. Each trainee will create and curate a portfolio that combines artifacts from coursework and research with reflections on the broader impacts of their work. Trainee recruitment emphasizes women and underrepresented groups. 
    more » « less