NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

VIKI: Systematic Cross-Platform Profile Inference of Tech Users

https://doi.org/10.1145/3717867.3717890

Treves, Ben; De_Cristofaro, Emiliano; Dong, Yue; Faloutsos, Michalis (May 2025, ACM)

Free, publicly-accessible full text available May 19, 2026
Disambiguating usernames across platforms: the GeekMAN approach

https://doi.org/10.1007/s13278-024-01321-x

Masud, Md Rayhanul; Treves, Ben; Faloutsos, Michalis (December 2024, Social Network Analysis and Mining)

Full Text Available
MetaSim: A Search Engine for Finding Similar GitHub Repositories

https://doi.org/10.1109/ICSME58944.2024.00093

Masud, Md Rayhanul; Rokon, Md_Omar Faruk; Zhang, Qian; Faloutsos, Michalis (October 2024, IEEE)

Full Text Available
Who is Creating Malware Repositories on GitHub and Why?

https://doi.org/10.1145/3589335.3651582

Tania, Nishat Ara; Masud, Md Rayhanul; Rokon, Md_Omar Faruk; Zhang, Qian; Faloutsos, Michalis (May 2024, ACM)

Full Text Available
C2Store: C2 Server Profiles at Your Fingertips

https://doi.org/10.1145/3629132

Jain, Vivek; Alam, S_M Maksudul; Krishnamurthy, Srikanth V; Faloutsos, Michalis (November 2023, Proceedings of the ACM on Networking)

How can we build a definitive capability for tracking C2 servers? Having a large-scale continuously updating capability would be essential for understanding the spatiotemporal behaviors of C2 servers and, ultimately, for helping contain botnet activities. Unfortunately, existing information from threat intelligence feeds and previous works is often limited to a specific set of botnet families or short-term data collections. Responding to this need, we present C2Store, an initiative to provide the most comprehensive information on C2 servers. Our work makes the following contributions: (a) we develop techniques to collect, verify, and combine C2 server addresses from five types of sources, including uncommon platforms, such as GitHub and Twitter; (b) we create an open-access annotated database of 335,967 C2 servers across 133 malware families, which supports semantically-rich and smart queries; (c) we identify surprising behaviors of C2 servers with respect to their spatiotemporal patterns and behaviors. First, we successfully mine Twitter and GitHub and identify C2 servers with a precision of 97% and 94%, respectively. Furthermore, we find that the threat feeds identify only 24% of the servers in our database, with Twitter and GitHub providing 32%. A surprising observation is the identification of 250 IP addresses, each of which hosts more than 5 C2 servers for different botnet families at the same time. Overall, we envision C2Store as an ongoing effort that will facilitate research by providing timely, historical, and comprehensive C2 server information by critically combining multiple sources of information.
more » « less
Full Text Available
Unveiling A Hidden Risk: Exposing Educational but Malicious Repositories in GitHub

Masud, Md Rayhanul; Faloutsos, Michalis (August 2023, ACM SIGKDD poster session)

Full Text Available
HyperMan: detecting misbehavior in online forums based on hyperlink posting behavior

https://doi.org/10.1007/s13278-022-00943-3

Islam, Risul; Treves, Ben; Rokon, Md Omar; Faloutsos, Michalis (December 2022, Social Network Analysis and Mining)

Full Text Available
URLytics: Profiling Forum Users from their Posted URLs

https://doi.org/10.1109/ASONAM55673.2022.10068682

Treves, Ben; Masud, Md Rayhanul; Faloutsos, Michalis (November 2022, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining)

Full Text Available
PIMan: A Comprehensive Approach for Establishing Plausible Influence among Software Repositories

https://doi.org/10.1109/ASONAM55673.2022.10068629

Rokon, Md Omar; Islam, Risul; Masud, Md Rayhanul; Faloutsos, Michalis (November 2022, IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2022)

Full Text Available
Repo2Vec: A Comprehensive Embedding Approach for Determining Repository Similarity

https://doi.org/10.1109/ICSME52107.2021.00038

Rokon, Md Omar; Yan, Pei; Islam, Risul; Faloutsos, Michalis (September 2021, IEEE International Conference on Software Maintenance and Evolution (ICSME) 2021)

How can we identify similar repositories and clusters among a large online archive, such as GitHub? Determining repository similarity is an essential building block in studying the dynamics and the evolution of such software ecosystems. The key challenge is to determine the right representation for the diverse repository features in a way that: (a) it captures all aspects of the available information, and (b) it is readily usable by ML algorithms. We propose Repo2Vec, a comprehensive embedding approach to represent a repository as a distributed vector by combining features from three types of information sources. As our key novelty, we consider three types of information: (a) metadata, (b) the structure of the repository, and (c) the source code. We also introduce a series of embedding approaches to represent and combine these information types into a single embedding. We evaluate our method with two real datasets from GitHub for a combined 1013 repositories. First, we show that our method outperforms previous methods in terms of precision (93% vs 78%), with nearly twice as many Strongly Similar repositories and 30% fewer False Positives. Second, we show how Repo2Vec provides a solid basis for: (a) distinguishing between malware and benign repositories, and (b) identifying a meaningful hierarchical clustering. For example, we achieve 98% precision, and 96% recall in distinguishing malware and benign repositories. Overall, our work is a fundamental building block for enabling many repository analysis functions such as repository categorization by target platform or intention, detecting code-reuse and clones, and identifying lineage and evolution.
more » « less
Full Text Available

Search for: All records