skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Comparison of Axiomatic Distance-Based Collective Intelligence Methods for Wireless Sensor Network State Estimation in the Presence of Information Injection
Wireless sensor networks are a cost-effective means of data collection, especially in areas which may not have significant infrastructure. There are significant challenges associated with the reliability of measurements, in particular due to their distributed nature. As such, it is important to develop methods that can extract reliable state estimation results in the presence of errors. This work proposes and compares methods based on collective intelligence ideas, namely consensus ranking and rating models, which are founded on axiomatic distances and intuitive social choice properties. The efficacy of these methods to assess a transmitted signal's strength with varying quantity and quality of incompleteness in the network's readings is tested.  more » « less
Award ID(s):
1850355
PAR ID:
10250540
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2020 IEEE 6th World Forum on Internet of Things (WF-IoT)
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Despite significant advances in reconstructing genome-scale metabolic networks, the understanding of cellular metabolism remains incomplete for many organisms. A promising approach for elucidating cellular metabolism is analysing the full scope of enzyme promiscuity, which exploits the capacity of enzymes to bind to non-annotated substrates and generate novel reactions. To guide time-consuming costly experimentation, different computational methods have been proposed for exploring enzyme promiscuity. One relevant algorithm is PROXIMAL, which strongly relies on KEGG to define generic reaction rules and link specific molecular substructures with associated chemical transformations. Here, we present a completely new pipeline, PROXIMAL2, which overcomes the dependency on KEGG data. In addition, PROXIMAL2 introduces two relevant improvements with respect to the former version: i) correct treatment of multi-step reactions and ii) tracking of electric charges in the transformations. We compare PROXIMAL and PROXIMAL2 in recovering annotated products from substrates in KEGG reactions, finding a highly significant improvement in the level of accuracy. We then applied PROXIMAL2 to predict degradation reactions of phenolic compounds in the human gut microbiota. The results were compared to RetroPath RL, a different and relevant enzyme promiscuity method. We found a significant overlap between these two methods but also complementary results, which open new research directions into this relevant question in nutrition. 
    more » « less
  2. Synopsis The increased use of imaging technology in biological research has drastically altered morphological studies in recent decades and allowed for the preservation of important collection specimens alongside detailed visualization of bony and soft-tissue structures. Despite the benefits associated with these newer imaging techniques, there remains a need for more “traditional” methods of morphological examination in many comparative studies. In this paper, we describe the costs and benefits of the various methods of visualizing, examining, and comparing morphological structures. There are significant differences not only in the costs associated with these different methods (monetary, time, equipment, and software), but also in the degree to which specimens are destroyed. We argue not for any one particular method over another in morphological studies, but instead suggest a combination of methods is useful not only for breadth of visualization, but also for the financial and time constraints often imposed on early-career research scientists. 
    more » « less
  3. We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $$n$$ data points requires $$\Omega(n^2)$$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the remainder of the complete pairwise similarity matrix. Significant work focuses on the efficient approximation of positive semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods. However, much less is understood about indefinite (non-PSD) similarity matrices, which often arise in NLP. Motivated by the observation that many of these matrices are still somewhat close to PSD, we introduce a generalization of the popular \emph{Nystrom method} to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix, producing a rank-$$s$$ approximation with just $O(ns)$ similarity computations. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in tasks of document classification, sentence similarity, and cross-document coreference. 
    more » « less
  4. In modern machine learning, users often have to collaborate to learn distributions that generate the data. Communication can be a significant bottleneck. Prior work has studied homogeneous users—i.e., whose data follow the same discrete distribution—and has provided optimal communication-efficient methods. How- ever, these methods rely heavily on homogeneity, and are less applicable in the common case when users’ discrete distributions are heterogeneous. Here we consider a natural and tractable model of heterogeneity, where users’ discrete distributions only vary sparsely, on a small number of entries. We propose a novel two-stage method named SHIFT: First, the users collaborate by communicating with the server to learn a central distribution; relying on methods from robust statistics. Then, the learned central distribution is fine-tuned to estimate the indi- vidual distributions of users. We show that our method is minimax optimal in our model of heterogeneity and under communication constraints. Further, we provide experimental results using both synthetic data and n-gram frequency estimation in the text domain, which corroborate its efficiency. 
    more » « less
  5. Kubatko, Laura (Ed.)
    Abstract Many recent phylogenetic methods have focused on accurately inferring species trees when there is gene tree discordance due to incomplete lineage sorting (ILS). For almost all of these methods, and for phylogenetic methods in general, the data for each locus are assumed to consist of orthologous, single-copy sequences. Loci that are present in more than a single copy in any of the studied genomes are excluded from the data. These steps greatly reduce the number of loci available for analysis. The question we seek to answer in this study is: what happens if one runs such species tree inference methods on data where paralogy is present, in addition to or without ILS being present? Through simulation studies and analyses of two large biological data sets, we show that running such methods on data with paralogs can still provide accurate results. We use multiple different methods, some of which are based directly on the multispecies coalescent model, and some of which have been proven to be statistically consistent under it. We also treat the paralogous loci in multiple ways: from explicitly denoting them as paralogs, to randomly selecting one copy per species. In all cases, the inferred species trees are as accurate as equivalent analyses using single-copy orthologs. Our results have significant implications for the use of ILS-aware phylogenomic analyses, demonstrating that they do not have to be restricted to single-copy loci. This will greatly increase the amount of data that can be used for phylogenetic inference.[Gene duplication and loss; incomplete lineage sorting; multispecies coalescent; orthology; paralogy.] 
    more » « less