Nowadays most mobile devices are equipped with advanced sensors, enabling the measurement of information about surrounding environment or social settings. The ubiquity of mobile devices makes them the perfect platform for massive data collection, which motivates the emergence of mobile crowdsensing paradigm. However, due to the inherent noisy nature of the sensing process and the limited capability of low-cost commodity sensors, crowdsensed information tends to be less reliable compared with sensing results through dedicated sensing hardware, and multiple crowdsensing sources may conflict with each other. Thus, it is important to resolve conflicts in the collected data and discover the underlying truth. Traditional truth discovery approaches usually estimate the reliability of data sources and predict the truth value based on source reliability. However, recent data poisoning attacks greatly degrade the performance of existing truth discovery algorithms, where attackers aim to maximize the utility loss. In this paper, we investigate the data poisoning attacks on truth discovery and propose a robust approach against such attacks through additional source estimation and source filtering before data aggregation. Based on real-world data, we simulate our approach and evaluate its performance under data poisoning attacks, demonstrating the robustness of our approach.
more »
« less
Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework
With the rapid growth of online social media and ubiquitous Internet connectivity, social sensing has emerged as a new crowdsourcing application paradigm of collecting observations (often called claims) about the physical environment from humans or devices on their behalf. A fundamental problem in social sensing applications lies in effectively ascertaining the correctness of claims and the reliability of data sources without knowing either of them a priori, which is referred to as truth discovery. While significant progress has been made to solve the truth discovery problem, some important challenges have not been well addressed yet. First, existing truth discovery solutions did not fully solve the dynamic truth discovery problem where the ground truth of claims changes over time. Second, many current solutions are not scalable to large-scale social sensing events because of the centralized nature of their truth discovery algorithms. Third, the heterogeneity and unpredictability of the social sensing data traffic pose additional challenges to the resource allocation and system responsiveness. In this paper, we developed a Scalable Streaming Truth Discovery (SSTD) solution to address the above challenges. In particular, we first developed a dynamic truth discovery scheme based on Hidden Markov Models (HMM) to effectively infer the evolving truth of reported claims. We further developed a distributed framework to imple- ment the dynamic truth discovery scheme using Work Queue in HTCondor system. We also integrated the SSTD scheme with an optimal workload allocation mechanism to dynamically allocate the resources (e.g., cores, memories) to the truth discovery tasks based on their computation requirements. We evaluated SSTD through real world social sensing applications using Twitter data feeds. The evaluation results on three real-world data traces (i.e., Boston Bombing, Paris Shooting and College Football) show that the SSTD scheme is scalable and outperforms the state-of-the- art truth discovery methods in terms of both effectiveness and efficiency.
more »
« less
- Award ID(s):
- 1642409
- PAR ID:
- 10047185
- Date Published:
- Journal Name:
- IEEE International Conference on Distributed Computing Systems
- Page Range / eLocation ID:
- 966 to 976
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A massive amount of data generated today on platforms such as social networks, telecommunication networks, and the internet in general can be represented as graph streams. Activity in a network’s underlying graph generates a sequence of edges in the form of a stream; for example, a social network may generate a graph stream based on the interactions (edges) between different users (nodes) over time. While many graph mining algorithms have already been developed for analyzing relatively small graphs, graphs that begin to approach the size of real-world networks stress the limitations of such methods due to their dynamic nature and the substantial number of nodes and connections involved. In this paper we present GraphZip, a scalable method for mining interesting patterns in graph streams. GraphZip is inspired by the Lempel-Ziv (LZ) class of compression algorithms, and uses a novel dictionary-based compression approach to discover maximally- compressing patterns in a graph stream. We experimentally show that GraphZip is able to retrieve complex and insightful patterns from large real-world graphs and artificially-generated graphs with ground truth patterns. Additionally, our results demonstrate that GraphZip is both highly efficient and highly effective compared to existing state-of-the-art methods for mining graph streams.more » « less
-
Algorithmic fairness in the context of personalized recommendation presents significantly different challenges to those commonly encountered in classification tasks. Researchers studying classification have generally considered fairness to be a matter of achieving equality of outcomes (or some other metric) between a protected and unprotected group, and built algorithmic interventions on this basis. We argue that fairness in real-world application settings in general, and especially in the context of personalized recommendation, is much more complex and multi-faceted, requiring a more general approach. To address the fundamental problem of fairness in the presence of multiple stakeholders, with different definitions of fairness, we propose the Social Choice for Recommendation Under Fairness – Dynamic (SCRUF-D) architecture, which formalizes multistakeholder fairness in recommender systems as a two-stage social choice problem. In particular, we express recommendation fairness as a combination of an allocation and an aggregation problem, which integrate both fairness concerns and personalized recommendation provisions, and derive new recommendation techniques based on this formulation. We demonstrate the ability of our framework to dynamically incorporate multiple fairness concerns using both real-world and synthetic datasets.more » « less
-
With the rapid advancement of edge computing and network function virtualization, it is promising to provide flexible and low-latency network services at the edge. However, due to the vulnerability of edge services and the volatility of edge computing system states, i.e., service request rates, failure rates, and resource prices, it is challenging to minimize the online service cost while providing the availability guarantee. This paper considers the problem of online virtual network function backup under availability constraints (OVBAC) for cost minimization in edge environments. We formulate the problem based on the characteristics of the volatility system states derived from real-world data and show the hardness of the formulated problem. We use an online backup deployment scheme named Drift-Plus-Penalty (DPP) with provable near-optimal performance for the AVBAC problem. In particular, DPP needs to solve an integer programming problem at the beginning of each time slot. We propose a dynamic programming-based algorithm that can optimally solve the problem in pseudo-polynomial time. Extensive real-world data-driven simulations demonstrate that DPP significantly outperforms popular baselines used in practice.more » « less
-
We study the problem of detecting abnormal inactivities within a single-occupied household based on smart meter readings. Such abnormal events include immobilizing medical conditions or sudden deaths of elderly or disabled occupants who live alone, the delayed discovery of which poses realistic social concerns as the population ages. Two novel unsupervised learning algorithms are developed and compared: one is based on nested dynamic time warping (DTW) distances and the other based on Mahalanobis distance with problem-specific features. Both algorithms are able to cold-start from limited historical data and perform well without extended parameter tuning. In addition, the algorithms are small profile in terms of data usage and computational need, and thus are suitable for implementation on smart meter hardware. The proposed methods have been thoroughly validated against real data sets with simulated target scenarios and have exhibited satisfactory performance. An implementation scheme on smart meter hardware is also discussed.more » « less
An official website of the United States government

