skip to main content


Search for: All records

Award ID contains: 1757787

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Two general approaches are common for evaluating automatically generated labels in topic modeling: direct human assessment; or performance metrics that can be calculated without, but still correlate with, human assessment. However, both approaches implicitly assume that the quality of a topic label is single-dimensional. In contrast, this paper provides evidence that human assessments about the quality of topic labels consist of multiple latent dimensions. This evidence comes from human assessments of four simple labeling techniques. For each label, study participants responded to several items asking them to assess each label according to a variety of different criteria. Exploratory factor analysis shows that these human assessments of labeling quality have a two-factor latent structure. Subsequent analysis demonstrates that this multi-item, two-factor assessment can reveal nuances that would be missed using either a single-item human assessment of perceived label quality or established performance metrics. The paper concludes by sug- gesting future directions for the development of human-centered approaches to evaluating NLP and ML systems more broadly. 
    more » « less
  2. Spamming reviews are prevalent in review systems to manipulate seller reputation and mislead customers. Spam detectors based on graph neural networks (GNN) exploit representation learning and graph patterns to achieve state-of-the-art detection accuracy. The detection can influence a large number of real-world entities and it is ethical to treat different groups of entities as equally as possible. However, due to skewed distributions of the graphs, GNN can fail to meet diverse fairness criteria designed for different parties. We formulate linear systems of the input features and the adjacency matrix of the review graphs for the certification of multiple fairness criteria. When the criteria are competing, we relax the certification and design a multi-objective optimization (MOO) algorithm to explore multiple efficient trade-offs, so that no objective can be improved without harming another objective. We prove that the algorithm converges to a Pareto efficient solution using duality and the implicit function theorem. Since there can be exponentially many trade-offs of the criteria, we propose a data-driven stochastic search algorithm to approximate Pareto fronts consisting of multiple efficient trade-offs. Experimentally, we show that the algorithms converge to solutions that dominate baselines based on fairness regularization and adversarial training. 
    more » « less
  3. null (Ed.)
    This paper introduces MEDS, a modular and elastic framework that simplifies the development of high-performance concurrent data structures that support linearizable primitive (i.e., add, remove, contains) and bulk (e.g., range query) operations. 
    more » « less
  4. null (Ed.)
    Large collections of datasets are being published on the Web at an increasing rate. This poses a problem to researchers and data journalists who must sift through these large quantities of data to find datasets that meet their needs. Our solution to this problem is cell-centric indexing, a novel approach which considers the individual cell of a dataset to be the fundamental unit of search, indexing the corresponding metadata to each individual cell. This facilitates a new style of user interface that allows users to explore the collection via histograms that show the distributions of various terms organized by how they are used in the dataset. 
    more » « less
  5. null (Ed.)
    This paper studies the current status and future directions of RTOS (Real-Time Operating Systems) for time-sensitive CPS (Cyber-Physical Systems). GPOS (General Purpose Operating Systems) existed before RTOS but did not meet performance requirements for time sensitive CPS. Many GPOS have put forward adaptations to meet the requirements of real-time performance, and this paper compares RTOS and GPOS and shows their pros and cons for CPS applications. Furthermore, comparisons among select RTOS such as VxWorks, RTLinux, and FreeRTOS have been conducted in terms of scheduling, kernel, and priority inversion. Various tools for WCET (Worst-Case Execution Time) estimation are discussed. This paper also presents a CPS use case of RTOS, i.e. JetOS for avionics, and future advancements in RTOS such as multi-core RTOS, new RTOS architecture and RTOS security for CPS. 
    more » « less
  6. There are many applications where positive instances are rare but important to identify. For example, in NLP, positive sentences for a given relation are rare in a large corpus. Positive data are more informative for learning in these applications, but before one labels a certain amount of data, it is unknown where to find the rare positives. Since random sampling can lead to significant waste in labeling effort, previous ”active search” methods use a single bandit model to learn about the data distribution (exploration) while sampling from the regions potentially containing more positives (exploitation). Many bandit models are possible and a sub-optimal model reduces labeling efficiency, but the optimal model is unknown before any data are labeled. We propose Meta-AS (Meta Active Search) that uses a meta-bandit to evaluate a set of base bandits and aims to label positive examples efficiently, comparing to the optimal base bandit with hindsight. The meta-bandit estimates the mean and variance of the performance of the base bandits, and selects a base bandit to propose what data to label next for exploration or exploitation. The feedback in the labels updates both the base bandits and the meta-bandit for the next round. Meta-AS can accommodate a diverse set of base bandits to explore assumptions about the dataset, without over-committing to a single model before labeling starts. Experiments on five datasets for relation extraction demonstrate that Meta-AS labels positives more efficiently than the base bandits and other bandit selection strategies. 
    more » « less
  7. Osborn, Joseph C. (Ed.)
    Integrated meaningful play is the idea that player’s choices should have a long-term effect on the game. In this paper we present I-score (for integrated), a scoring function for scoring integrative game play as a function of the game’s storylines. The I-scores are in the range [0,1]. In games with I-scores close to one, player’s early choices determine the game’s ending; choices made later in the game do not change the final ending of the game. In contrast, games with I-scores close to zero, players’s choices can change the ending until the very end. Games with scores closer to 0.5 provide a more balanced player choice whereby the game’s ending still can be changed despite early decisions, but not so much that the ending could be changed at any point. 
    more » « less
  8. In recent years, robotic technologies, e.g. drones or autonomous cars have been applied to the agricultural sectors to improve the efficiency of typical agricultural operations. Some agricultural tasks that are ideal for robotic automation are yield estimation and robotic harvesting. For these applications, an accurate and reliable image-based detection system is critically important. In this work, we present a low-cost strawberry detection system based on convolutional neural networks. Ablation studies are presented to validate the choice of hyper- parameters, framework, and network structure. Additional modifications to both the training data and network structure that improve precision and execution speed, e.g., input compression, image tiling, color masking, and network compression, are discussed. Finally, we present a final network implementation on a Raspberry Pi 3B that demonstrates a detection speed of 1.63 frames per second and an average precision of 0.842. 
    more » « less
  9. Reliable methods for tumor detection and brain abnormalities are crucial to help find diseases at early stages. Having accurate software that uses machine learning to identify abnormalities of the brain may prevent a disease progression if used on an MRI of the patient. In this paper, we develop a neural network to detect and highlight brain tumors present in MRI’s. Our model is designed to be more compact than typical CNNs to minimize prediction times while still maintaining prediction accuracy. The model uses a dice coefficient for the loss function as well as accuracy metric. We adopt Adadelta as the optimizer as it is more robust and eliminates the requirement of manually tuned learning rates. Our model reduces the prediction time with fewer layers and convolution filters, while allowing rapid convergence to a stable solution. In addition, the models hyper-parameters are being fine-tuned in an iterative process to ideally achieves better segmentation accuracy. Experimental results show that our model improves the performance compared with the state-of-the-art methods. 
    more » « less