skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CS5604 (Information Retrieval) Fall 2020 Front-end (FE) Team Project
With the demand and abundance of information increasing over the last two decades, generations of computer scientists are trying to improve the whole process of information searching, retrieval, and storage. With the diversification of the information sources, users' demand for various requirements of the data has also changed drastically both in terms of usability and performance. Due to the growth of the source material and requirements, correctly sorting, filtering, and storing has given rise to many new challenges in the field. With the help of all four other teams on this project, we are developing an information retrieval, analysis, and storage system to retrieve data from Virginia Tech's Electronic Thesis and Dissertation (ETD), Twitter, and Web Page archives. We seek to provide an appropriate data research and management tool to the users to access specific data. The system will also give certain users the authority to manage and add more data to the system. This project's deliverable will be combined with four others to produce a system usable by Virginia Tech's library system to manage, maintain, and analyze these archives. This report attempts to introduce the system components and design decisions regarding how it has been planned and implemented. Our team has developed a front end web interface that is able to search, retrieve, and manage three important content collection types: ETDs, tweets, and web pages. The interface incorporates a simple hierarchical user permission system, providing different levels of access to its users. In order to facilitate the workflow with other teams, we have containerized this system and made it available on the Virginia Tech cloud server. The system also makes use of a dynamic workflow system using a KnowledgeGraph and Apache Airflow, providing high levels of functional extensibility to the system. This allows curators and researchers to use containerised services for crawling, pre-processing, parsing, and indexing their custom corpora and collections that are available to them in the system.  more » « less
Award ID(s):
1638207 1835660
PAR ID:
10210442
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Information storage and retrieval
ISSN:
0020-0271
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Collecting, storing, and providing access to Internet of Things (IoT) data are fundamental tasks to many smart city projects. However, developing and integrating IoT systems is still a significant barrier to entry. In this work, we share insights on the development of cloud data storage and visualization tools for IoT smart city applications using flood warning as an example application. The developed system incorporates scalable, autonomous, and inexpensive features that allow users to monitor real-time environmental conditions, and to create threshold-based alert notifications. Built in Amazon Web Services (AWS), the system leverages serverless technology for sensor data backup, a relational database for data management, and a graphical user interface (GUI) for data visualizations and alerts. A RESTful API allows for easy integration with web-based development environments, such as Jupyter notebooks, for advanced data analysis. The system can ingest data from LoRaWAN sensors deployed using The Things Network (TTN). A cost analysis can support users’ planning and decision-making when deploying the system for different use cases. A proof-of-concept demonstration of the system was built with river and weather sensors deployed in a flood prone suburban watershed in the city of Charlottesville, Virginia. 
    more » « less
  2. null (Ed.)
    We present the design and development of an open-source web application called Water Data Explorer (WDE), designed to retrieve water resources observation and model data from data catalogs that follow the WaterOneFlow and WaterML Service-Oriented Architecture standards. WDE is a fully customizable web application built using the Tethys Platform development environment. As it is open source, it can be deployed on the web servers of international government agencies, non-governmental organizations, research teams, and others. Water Data Explorer provides uniform access to international data catalogs, such as the Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI) Hydrologic Information System (HIS) and the World Meteorological Organization (WMO) Hydrological Observing System (WHOS), as well as to local data catalogs that support the WaterOneFlow and WaterML standards. WDE supports data discovery, visualization, downloading, and basic data interpolation. It can be customized for different regions by modifying the user interface (i.e., localization), as well as by including pre-defined data catalogs and data sources. Access to WDE functionality is provided by a new open-source Python package called “Pywaterml” which provides programmable access to WDE methods to discover, visualize, download, and interpolate data. We present two case studies that access the CUAHSI HIS and WHOS catalogs and demonstrate regional customization, data discovery from WaterOneFlow web services, data visualization of time series observations, and data downloading. 
    more » « less
  3. A novel coding design is proposed to enhance information retrieval in a wireless network of users with partial access to the data, in the sense of observation, measurement, computation, or storage. Information exchange in the network is assisted by a multi-antenna base station (BS), with no direct access to the data. Accordingly, the missing parts of data are exchanged among users through an uplink (UL) step followed by a downlink (DL) step. In this paper, new coding strategies, inspired by coded caching (CC) techniques, are devised to enhance both UL and DL steps. In the UL step, users transmit encoded and properly combined parts of their accessible data to the BS. Then, during the DL step, the BS carries out the required processing on its received signals and forwards a proper combination of the resulting signal terms back to the users, enabling each user to retrieve the desired information. Using the devised coded data retrieval strategy, the data exchange in both UL and DL steps requires the same communication delay, measured by normalized delivery time (NDT). Furthermore, the NDT of the UL/DL step is shown to coincide with the optimal NDT of the original DL multi-input single-output CC scheme, in which the BS is connected to a centralized data library. 
    more » « less
  4. A novel coding design is proposed to enhance information retrieval in a wireless network of users with partial access to the data, in the sense of observation, measurement, computation, or storage. Information exchange in the network is assisted by a multi-antenna base station (BS), with no direct access to the data. Accordingly, the missing parts of data are exchanged among users through an uplink (UL) step followed by a downlink (DL) step. In this paper, new coding strategies, inspired by coded caching (CC) techniques, are devised to enhance both UL and DL steps. In the UL step, users transmit encoded and properly combined parts of their accessible data to the BS. Then, during the DL step, the BS carries out the required processing on its received signals and forwards a proper combination of the resulting signal terms back to the users, enabling each user to retrieve the desired information. Using the devised coded data retrieval strategy, the data exchange in both UL and DL steps requires the same communication delay, measured by normalized delivery time (NDT). Furthermore, the NDT of the UL/DL step is shown to coincide with the optimal NDT of the original DL multi-input single-output CC scheme, in which the BS is connected to a centralized data library. 
    more » « less
  5. Abstract. As cloud-based web services get more and more capable, available, and powerful (CAP), data science and engineering is pulled toward the frontline because DATA means almost anything-as-a-service (XaaS) via Digital Archiving and Transformed Analytics. In general, a web service (via a website) serves customers with web documents in HTML, JSON, XML, and multimedia via interactive (request) and responsive (reply) ways for specific domain problem solving over the Internet. In particular, a web service is deeply involved with UI & UX (user interface and user experience) plus considerate regulations on QoS (Quality of Service) as well, which refers to both information synthesis and security, namely availability and reliability for providential web services. This paper, based on the novel wiseCIO as a Platform-as-a-Service (PaaS), presents digital archiving 3 and transformed analytics (DATA) via machine learning, one of the most practical aspects of artificial intelligence. Machine learning is the science of data analysis that automates analytical model building and online analytical processing (OLAP) that enables computers to act without being explicitly programmed through CTMP. Computational thinking combined with manageable processing is 4 thoroughly discussed and utilized for FAST solutions in a feasible, analytical, scalable and testable approach. DATA is central to information synthesis and analytics (ISA), and digitized archives plays a key role in transformed analytics on intelligence for business, education and entertainment (iBEE). Case studies as applicable examples are discussed over broad fields where archival digitization is required for analytical transformation via machine learning, such as scalable ARM (archival repository for manageable accessibility), visual BUS (biological understanding from STEM), schooling DIGIA (digital intelligence governing instruction and administering), viewable HARP (historical archives & religious preachings), vivid MATH (mathematical apps in teaching and hands-on exercise), and SHARE (studies via hands-on assignment, revision and evaluation). As a result, wiseCIO promotes DATA service by providing ubiquitous web services of analytical processing via universal interface and user-centric experience in favor of logical organization of web content and relational information groupings that are vital steps in the ability of an archivist or librarian to recommend and retrieve information for a researcher. More important, wiseCIO also plays a key role as a content management system and delivery platform with capacity of hosting 10,000+ traditional web pages with great ease. 
    more » « less