Knowledge about the geographic locations of Internet routers and servers is highly valuable for research on various aspects of Internet structure, performance, economics, and security. Whereas databases for geolocation are commercially available and targeted mostly at end hosts, RIPE offers an open IPmap platform, including its single-radius engine, for geolocation of core Internet infrastructure. This paper introduces the research community to the IPmap single-radius engine and evaluates effectiveness of this method versus commercial geolocation databases NetAcuity and GeoLite2. Access to ground truth constitutes a major challenge in conducting such evaluation studies. The paper collects IP addresses for its study from three sources: virtual machines from the Ring of the Netherlands Network Operators’ Group, M-Lab Pods operated by Google, and CAIDA’s Ark monitors. The ground truth dataset is further diversified through addition of IP addresses that are small latency away from Ark monitors. The evaluation considers accuracy, coverage, and consistency of geolocation as well as effectiveness of the single-radius method for different types of autonomous systems. The paper manually analyzes a problematic case where single-radius mistakenly geolocates an IP address of a Budapest-based router to Vienna. Finally, the paper provides recommendations to both users and developers of the single-radius method and discusses limitations of the reported evaluation. The main conclusion is that the IPmap single-radius engine geolocates core Internet infrastructure more accurately than the considered commercial databases and that Internet researchers can greatly benefit from using the IPmap platform for their geolocation needs. 
                        more » 
                        « less   
                    
                            
                            Connecting the Hosts: Street-Level IP Geolocation with Graph Neural Networks
                        
                    
    
            Pinpointing the geographic location of an IP address is important for a range of location-aware applications spanning from targeted advertising to fraud prevention. The majority of traditional measurement-based and recent learning-based methods either focus on the efficient employment of topology or utilize data mining to find clues of the target IP in publicly available sources. Motivated by the limitations in existing works, we propose a novel framework named GraphGeo, which provides a complete processing methodology for street-level IP geolocation with the application of graph neural networks. It incorporates IP hosts knowledge and kinds of neighborhood relationships into the graph to infer spatial topology for high-quality geolocation prediction. We explicitly consider and alleviate the negative impact of uncertainty caused by network jitter and congestion, which are pervasive in complicated network environments. Extensive evaluations across three large-scale real-world datasets demonstrate that GraphGeo significantly reduces the geolocation errors compared to the state-of-the-art methods. Moreover, the proposed framework has been deployed on the web platform as an online service for 6 months. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2030249
- PAR ID:
- 10403312
- Date Published:
- Journal Name:
- The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
- Page Range / eLocation ID:
- 4121 to 4131
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            null (Ed.)Circuit obfuscation is a recently proposed defense mechanism to protect the intellectual property (IP) of digital integrated circuits (ICs) from reverse engineering. There have been effective schemes, such as satisfiability (SAT)-checking based attacks that can potentially decrypt obfuscated circuits, which is called deobfuscation. Deobfuscation runtime could be days or years, depending on the layouts of the obfuscated ICs. Hence, accurately pre-estimating the deobfuscation runtime within a reasonable amount of time is crucial for IC designers to optimize their defense. However, it is challenging due to (1) the complexity of graph-structured circuit; (2) the varying-size topology of obfuscated circuits; (3) requirement on efficiency for deobfuscation method. This study proposes a framework that predicts the deobfuscation runtime based on graph deep learning techniques to address the challenges mentioned above. A conjunctive normal form (CNF) bipartite graph is utilized to characterize the complexity of this SAT problem by analyzing the SAT attack method. Multi-order information of the graph matrix is designed to identify the essential features and reduce the computational cost. To overcome the difficulty in capturing the dynamic size of the CNF graph, an energy-based kernel is proposed to aggregate dynamic features into an identical vector space. Then, we designed a framework, Deep Survival Analysis with Graph (DSAG), which integrates energy-based layers and predicts runtime inspired by censored regression in survival analysis. Integrating uncensored data with censored data, the proposed model improves the standard regression significantly. DSAG is an end-to-end framework that can automatically extract the determinant features for deobfuscation runtime. Extensive experiments on benchmarks demonstrate its effectiveness and efficiency.more » « less
- 
            Graphs are powerful representations for relations among objects, which have attracted plenty of attention in both academia and industry. A fundamental challenge for graph learning is how to train an effective Graph Neural Network (GNN) encoder without labels, which are expensive and time consuming to obtain. Contrastive Learning (CL) is one of the most popular paradigms to address this challenge, which trains GNNs by discriminating positive and negative node pairs. Despite the success of recent CL methods, there are still two under-explored problems. Firstly, how to reduce the semantic error introduced by random topology based data augmentations. Traditional CL defines positive and negative node pairs via the node-level topological proximity, which is solely based on the graph topology regardless of the semantic information of node attributes, and thus some semantically similar nodes could be wrongly treated as negative pairs. Secondly, how to effectively model the multiplexity of the real-world graphs, where nodes are connected by various relations and each relation could form a homogeneous graph layer. To solve these problems, we propose a novel multiplex heterogeneous graph prototypical contrastive leaning (X-GOAL) framework to extract node embeddings. X-GOAL is comprised of two components: the GOAL framework, which learns node embeddings for each homogeneous graph layer, and an alignment regularization, which jointly models different layers by aligning layer-specific node embeddings. Specifically, the GOAL framework captures the node-level information by a succinct graph transformation technique, and captures the cluster-level information by pulling nodes within the same semantic cluster closer in the embedding space. The alignment regularization aligns embeddings across layers at both node level and cluster level. We evaluate the proposed X-GOAL on a variety of real-world datasets and downstream tasks to demonstrate the effectiveness of the X-GOAL framework.more » « less
- 
            People increasingly share personal information, including their photos and photo collections, on social media. This information, however, can compromise individual privacy, particularly as social media platforms use it to infer detailed models of user behavior, including tracking their location. We consider the specific issue of location privacy as potentially revealed by posting photo collections, which facilitate accurate geolocation with the help of deep learning methods even in the absence of geotags. One means to limit associated inadvertent geolocation privacy disclosure is by carefully pruning select photos from photo collections before these are posted publicly. We study this problem formally as a combinatorial optimization problem in the context of geolocation prediction facilitated by deep learning. We first demonstrate the complexity both by showing that a natural greedy algorithm can be arbitrarily bad and by proving that the problem is NP-Hard. We then exhibit an important tractable special case, as well as a more general approach based on mixed-integer linear programming. Through extensive experiments on real photo collections, we demonstrate that our approaches are indeed highly effective at preserving geolocation privacy.more » « less
- 
            A comprehensive understanding of the topology of the electric power transmission network (EPTN) is essential for reliable and robust control of power systems. While existing research primarily relies on domain-specific methods, it lacks data-driven approaches that have proven effective in modeling the topology of complex systems. To address this gap, this paper explores the potential of data-driven methods for more accurate and adaptive solutions to uncover the true underlying topology of EPTNs. First, this paper examines Gaussian Graphical Models (GGM) to create an EPTN network graph (i.e., undirected simple graph). Second, to further refine and validate this estimated network graph, a physics-based, domain specific refinement algorithm is proposed to prune false edges and construct the corresponding electric power flow network graph (i.e., directed multi-graph). The proposed method is tested using a synchrophasor dataset collected from a two-area, four-machine power system simulated on the real-time digital simulator (RTDS) platform. Experimental results show both the network and flow graphs can be reconstructed using various operating conditions and topologies with limited failure cases.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    