skip to main content


Title: NetProtect: Network Perturbations to Protect Nodes against Entry-Point Attack
In many network applications, it may be desirable to conceal certain target nodes from detection by a data collector, who is using a crawling algorithm to explore a network. For example, in a computer network, the network administrator may wish to protect those computers (target nodes) with sensitive information from discovery by a hacker who has exploited vulnerable machines and entered the network. These networks are often protected by hiding the machines (nodes) from external access, and allow only fixed entry points into the system (protection against external attacks). However, in this protection scheme, once one of the entry points is breached, the safety of all internal machines is jeopardized (i.e., the external attack turns into an internal attack). In this paper, we view this problem from the perspective of the data protector. We propose the Node Protection Problem: given a network with known entry points, which edges should be removed/added so as to protect as many target nodes from the data collector as possible? A trivial way to solve this problem would be to simply disconnect either the entry points or the target nodes – but that would make the network non-functional. Accordingly, we impose certain constraints: for each node, only (1 − r) fraction of its edges can be removed, and the resulting network must not be disconnected. We propose two novel scoring mechanisms - the Frequent Path Score and the Shortest Path Score. Using these scores, we propose NetProtect, an algorithm that selects edges to be removed or added so as to best impede the progress of the data collector. We show experimentally that NetProtect outperforms baseline node protection algorithms across several real-world networks. In some datasets, With 1% of the edges removed by NetProtect, we found that the data collector requires up to 6 (4) times the budget compared to the next best baseline in order to discover 5 (50) nodes.  more » « less
Award ID(s):
1908048
NSF-PAR ID:
10295611
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM Web Science Conference
Page Range / eLocation ID:
93 to 101
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Context-free language reachability (CFL-reachability) is a fundamental framework for program analysis. A large variety of static analyses can be formulated as CFL-reachability problems, which determines whether specific source-sink pairs in an edge-labeled graph are connected by a reachable path, i.e., a path whose edge labels form a string accepted by the given CFL. Computing CFL-reachability is expensive. The fastest algorithm exhibits a slightly subcubic time complexity with respect to the input graph size. Improving the scalability of CFL-reachability is of practical interest, but reducing the time complexity is inherently difficult. In this paper, we focus on improving the scalability of CFL-reachability from a more practical perspective---reducing the input graph size. Our idea arises from the existence of trivial edges, i.e., edges that do not affect any reachable path in CFL-reachability. We observe that two nodes joined by trivial edges can be folded---by merging the two nodes with all the edges joining them removed---without affecting the CFL-reachability result. By studying the characteristic of the recursive state machines (RSMs), an alternative form of CFLs, we propose an approach to identify foldable node pairs without the need to verify the underlying reachable paths (which is equivalent to solving the CFL-reachability problem). In particular, given a CFL-reachability problem instance with an input graph G and an RSM, based on the correspondence between paths in G and state transitions in RSM, we propose a graph folding principle, which can determine whether two adjacent nodes are foldable by examining only their incoming and outgoing edges. On top of the graph folding principle, we propose an efficient graph folding algorithm GF. The time complexity of GF is linear with respect to the number of nodes in the input graph. Our evaluations on two clients (alias analysis and value-flow analysis) show that GF significantly accelerates RSM/CFL-reachability by reducing the input graph size. On average, for value-flow analysis, GF reduces 60.96% of nodes and 42.67% of edges of the input graphs, obtaining a speedup of 4.65× and a memory usage reduction of 57.35%. For alias analysis, GF reduces 38.93% of nodes and 35.61% of edges of the input graphs, obtaining a speedup of 3.21× and a memory usage reduction of 65.19%. 
    more » « less
  2. Knowledge graph is ubiquitous and plays an important role in many real-world applications, including recommender systems, question answering, fact-checking, and so on. However, most of the knowledge graphs are incomplete which can hamper their practical usage. Fortunately, knowledge graph completion (KGC) can mitigate this problem by inferring missing edges in the knowledge graph according to the existing information. In this paper, we propose a novel KGC method named ABM (Attention-Based Message passing) which focuses on predicting the relation between any two entities in a knowledge graph. The proposed ABM consists of three integral parts, including (1) context embedding, (2) structure embedding, and (3) path embedding. In the context embedding, the proposed ABM generalizes the existing message passing neural network to update the node embedding and the edge embedding to assimilate the knowledge of nodes' neighbors, which captures the relative role information of the edge that we want to predict. In the structure embedding, the proposed method overcomes the shortcomings of the existing GNN method (i.e., most methods ignore the structural similarity between nodes.) by assigning different attention weights to different nodes while doing the aggregation. Path embedding generates paths between any two entities and treats these paths as sequences. Then, the sequence can be used as the input of the Transformer to update the embedding of the knowledge graph to gather the global role of the missing edges. By utilizing these three mutually complementary strategies, the proposed ABM is able to capture both the local and global information which in turn leads to a superb performance. Experiment results show that ABM outperforms baseline methods on a wide range of datasets. 
    more » « less
  3. Abstract STUDY QUESTION

    Can we derive adequate models to predict the probability of conception among couples actively trying to conceive?

    SUMMARY ANSWER

    Leveraging data collected from female participants in a North American preconception cohort study, we developed models to predict pregnancy with performance of ∼70% in the area under the receiver operating characteristic curve (AUC).

    WHAT IS KNOWN ALREADY

    Earlier work has focused primarily on identifying individual risk factors for infertility. Several predictive models have been developed in subfertile populations, with relatively low discrimination (AUC: 59–64%).

    STUDY DESIGN, SIZE, DURATION

    Study participants were female, aged 21–45 years, residents of the USA or Canada, not using fertility treatment, and actively trying to conceive at enrollment (2013–2019). Participants completed a baseline questionnaire at enrollment and follow-up questionnaires every 2 months for up to 12 months or until conception. We used data from 4133 participants with no more than one menstrual cycle of pregnancy attempt at study entry.

    PARTICIPANTS/MATERIALS, SETTING, METHODS

    On the baseline questionnaire, participants reported data on sociodemographic factors, lifestyle and behavioral factors, diet quality, medical history and selected male partner characteristics. A total of 163 predictors were considered in this study. We implemented regularized logistic regression, support vector machines, neural networks and gradient boosted decision trees to derive models predicting the probability of pregnancy: (i) within fewer than 12 menstrual cycles of pregnancy attempt time (Model I), and (ii) within 6 menstrual cycles of pregnancy attempt time (Model II). Cox models were used to predict the probability of pregnancy within each menstrual cycle for up to 12 cycles of follow-up (Model III). We assessed model performance using the AUC and the weighted-F1 score for Models I and II, and the concordance index for Model III.

    MAIN RESULTS AND THE ROLE OF CHANCE

    Model I and II AUCs were 70% and 66%, respectively, in parsimonious models, and the concordance index for Model III was 63%. The predictors that were positively associated with pregnancy in all models were: having previously breastfed an infant and using multivitamins or folic acid supplements. The predictors that were inversely associated with pregnancy in all models were: female age, female BMI and history of infertility. Among nulligravid women with no history of infertility, the most important predictors were: female age, female BMI, male BMI, use of a fertility app, attempt time at study entry and perceived stress.

    LIMITATIONS, REASONS FOR CAUTION

    Reliance on self-reported predictor data could have introduced misclassification, which would likely be non-differential with respect to the pregnancy outcome given the prospective design. In addition, we cannot be certain that all relevant predictor variables were considered. Finally, though we validated the models using split-sample replication techniques, we did not conduct an external validation study.

    WIDER IMPLICATIONS OF THE FINDINGS

    Given a wide range of predictor data, machine learning algorithms can be leveraged to analyze epidemiologic data and predict the probability of conception with discrimination that exceeds earlier work.

    STUDY FUNDING/COMPETING INTEREST(S)

    The research was partially supported by the U.S. National Science Foundation (under grants DMS-1664644, CNS-1645681 and IIS-1914792) and the National Institutes for Health (under grants R01 GM135930 and UL54 TR004130). In the last 3 years, L.A.W. has received in-kind donations for primary data collection in PRESTO from FertilityFriend.com, Kindara.com, Sandstone Diagnostics and Swiss Precision Diagnostics. L.A.W. also serves as a fibroid consultant to AbbVie, Inc. The other authors declare no competing interests.

    TRIAL REGISTRATION NUMBER

    N/A.

     
    more » « less
  4. Mobile devices typically rely on entry-point and other one-time authentication mechanisms such as a password, PIN, fingerprint, iris, or face. But these authentication types are prone to a wide attack vector and worse 1 INTRODUCTION Currently smartphones are predominantly protected a patterned password is prone to smudge attacks, and fingerprint scanning is prone to spoof attacks. Other forms of attacks include video capture and shoulder surfing. Given the increasingly important roles smartphones play in e-commerce and other operations where security is crucial, there lies a strong need of continuous authentication mechanisms to complement and enhance one-time authentication such that even if the authentication at the point of login gets compromised, the device is still unobtrusively protected by additional security measures in a continuous fashion. The research community has investigated several continuous authentication mechanisms based on unique human behavioral traits, including typing, swiping, and gait. To this end, we focus on investigating physiological traits. While interacting with hand-held devices, individuals strive to achieve stability and precision. This is because a certain degree of stability is required in order to manipulate and interact successfully with smartphones, while precision is needed for tasks such as touching or tapping a small target on the touch screen (Sitov´a et al., 2015). As a result, to achieve stability and precision, individuals tend to develop their own postural preferences, such as holding a phone with one or both hands, supporting hands on the sides of upper torso and interacting, keeping the phone on the table and typing with the preferred finger, setting the phone on knees while sitting crosslegged and typing, supporting both elbows on chair handles and typing. On the other hand, physiological traits, such as hand-size, grip strength, muscles, age, 424 Ray, A., Hou, D., Schuckers, S. and Barbir, A. Continuous Authentication based on Hand Micro-movement during Smartphone Form Filling by Seated Human Subjects. DOI: 10.5220/0010225804240431 In Proceedings of the 7th International Conference on Information Systems Security and Privacy (ICISSP 2021), pages 424-431 ISBN: 978-989-758-491-6 Copyrightc 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved still, once compromised, fail to protect the user’s account and data. In contrast, continuous authentication, based on traits of human behavior, can offer additional security measures in the device to authenticate against unauthorized users, even after the entry-point and one-time authentication has been compromised. To this end, we have collected a new data-set of multiple behavioral biometric modalities (49 users) when a user fills out an account recovery form in sitting using an Android app. These include motion events (acceleration and angular velocity), touch and swipe events, keystrokes, and pattern tracing. In this paper, we focus on authentication based on motion events by evaluating a set of score level fusion techniques to authenticate users based on the acceleration and angular velocity data. The best EERs of 2.4% and 6.9% for intra- and inter-session respectively, are achieved by fusing acceleration and angular velocity using Nandakumar et al.’s likelihood ratio (LR) based score fusion. 
    more » « less
  5. Investigating the nature of system intrusions in large distributed systems remains a notoriously difficult challenge. While monitoring tools (e.g., Firewalls, IDS) provide preliminary alerts through easy-to-use administrative interfaces, attack reconstruction still requires that administrators sift through gigabytes of system audit logs stored locally on hundreds of machines. At present, two fundamental obstacles prevent synergy between system-layer auditing and modern cluster monitoring tools: 1) the sheer volume of audit data generated in a data center is prohibitively costly to transmit to a central node, and 2) system- layer auditing poses a “needle-in-a-haystack” problem, such that hundreds of employee hours may be required to diagnose a single intrusion. This paper presents Winnower, a scalable system for audit-based cluster monitoring that addresses these challenges. Our key insight is that, for tasks that are replicated across nodes in a distributed application, a model can be defined over audit logs to succinctly summarize the behavior of many nodes, thus eliminating the need to transmit redundant audit records to a central monitoring node. Specifically, Winnower parses audit records into provenance graphs that describe the actions of individual nodes, then performs grammatical inference over individual graphs using a novel adaptation of Deterministic Finite Automata (DFA) Learning to produce a behavioral model of many nodes at once. This provenance model can be efficiently transmitted to a central node and used to identify anomalous events in the cluster. We have implemented Winnower for Docker Swarm container clusters and evaluate our system against real-world applications and attacks. We show that Winnower dramatically reduces storage and network overhead associated with aggregating system audit logs, by as much as 98%, without sacrificing the important information needed for attack investigation. Winnower thus represents a significant step forward for security monitoring in distributed systems. 
    more » « less