skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adapting Meta Knowledge with Heterogeneous Information Network for COVID-19 Themed Malicious Repository Detection
As cyberattacks caused by malware have proliferated during the pandemic, building an automatic system to detect COVID-19 themed malware in social coding platforms is in urgent need. The existing methods mainly rely on file content analysis while ignoring structured information among entities in social coding platforms. Additionally, they usually require sufficient data for model training, impairing their performances over cases with limited data which is common in reality. To address these challenges, we develop Meta-AHIN, a novel model for COVID-19 themed malicious repository detection in GitHub. In Meta-AHIN, we first construct an attributed heterogeneous information network (AHIN) to model the code content and social coding properties in GitHub; and then we exploit attention-based graph convolutional neural network (AGCN) to learn repository embeddings and present a meta-learning framework for model optimization. To utilize unlabeled information in AHIN and to consider task influence of different types of repositories, we further incorporate node attribute-based self-supervised module and task-aware attention weight into AGCN and meta-learning respectively. Extensive experiments on the collected data from GitHub demonstrate that Meta-AHIN outperforms state-of-the-art methods.  more » « less
Award ID(s):
2203261 2217239
PAR ID:
10319625
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
30th International Joint Conference on Artificial Intelligence (IJCAI)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The COVID-19 pandemic was a catalyst for many different trends in our daily life worldwide. While there has been an overall rise in cybercrime during this time, there has been relatively little research done about malicious COVID-19 themed AndroidOS applications. With the rise in reports of users falling victim to malicious COVID-19 themed AndroidOS applications, there is a need to learn about the detection of malware for pandemics-themed mobile apps.. In this project, we extracted the permissions requests from 1959 APK files from a dataset containing benign and malware COVID-19 themed apps. We then created and compared eight unique models of four varying classifiers to determine their ability to identify potentially malicious APK files based on the permissions the APK file requests: support vector machine, neural network, decision trees, and categorical naive bayes. These classifiers were then trained using Synthetic Minority Oversampling Technique (SMOTE) to balance the dataset due to the lack of samples of malware compared to non-malware APKs. Finally, we evaluated the models using K-Fold Cross-Validation and found the decision tree classifier to be the best performing classifier. 
    more » « less
  2. The COVID-19 pandemic has significantly impacted most countries in the world. Analyzing COVID-19 data from these countries together is a prominent challenge. Under the sponsorship of NSF REU, this paper describes our experience with a ten-week project that aims to guide an REU scholar to develop a physics-guided graph attention network to predict the global COVID- 19 Pandemics. We mainly presented the preparation, implementation, and dissemination of the addressed project. The COVID-19 situation in a country could be dramatically different from that of others, which suggests that COVID-19 pandemic data are generated based on different mechanisms, making COVID-19 data in different countries follow different probability distributions. Learning more than one hundred underlying probability distributions for countries in the world from large scale COVID- 19 data is beyond a single machine learning model. To address this challenge, we proposed two team-learning frameworks for predicting the COVID-19 pandemic trends: peer learning and layered ensemble learning framework. This addressed framework assigns an adaptive physics-guided graph attention network (GAT) to each learning agent. All the learning agents are fabricated in a hierarchical architecture, which enables agents to collaborate with each other in peer-to-peer and cross-layer way. This layered architecture shares the burden of large-scale data processing on machine learning models of all units. Experiments are run to verify the effectiveness of our approaches. The results indicate the proposed ensemble outperforms baseline methods. Besides being documented on GitHub, this work has resulted in two journal papers. 
    more » « less
  3. As modern social coding platforms such as GitHub and Stack Overflow become increasingly popular, their potential security risks increase as well (e.g., risky or malicious codes could be easily embedded and distributed). To enhance the social coding security, in this paper, we propose to automate cross-platform user identification between GitHub and Stack Overflow to combat the attackers who attempt to poison the modern software programming ecosystem. To solve this problem, an important insight brought by this work is to leverage social coding properties in addition to user attributes for cross-platform user identification. To depict users in GitHub and Stack Overflow (attached with attributed information), projects, questions and answers as well as the rich semantic relations among them, we first introduce an attributed heterogeneous information network (AHIN) for modeling. Then, we propose a novel AHIN representation learning model AHIN2Vec to efficiently learn node (i.e., user) representations in AHIN for cross-platform user identification. Comprehensive experiments on the data collections from GitHub and Stack Overflow are conducted to validate the effectiveness of our developed system iDev integrating our proposed method in cross-platform user identification by comparisons with other baselines. 
    more » « less
  4. Abstract We show that malicious COVID-19 content, including racism, disinformation, and misinformation, exploits the multiverse of online hate to spread quickly beyond the control of any individual social media platform. We provide a first mapping of the online hate network across six major social media platforms. We demonstrate how malicious content can travel across this network in ways that subvert platform moderation efforts. Machine learning topic analysis shows quantitatively how online hate communities are sharpening COVID-19 as a weapon, with topics evolving rapidly and content becoming increasingly coherent. Based on mathematical modeling, we provide predictions of how changes to content moderation policies can slow the spread of malicious content. 
    more » « less
  5. People are increasingly exposed to science and political information from social media. One consequence is that these sites play host to “alternative influencers,” who spread misinformation. However, content posted by alternative influencers on different social media platforms is unlikely to be homogenous. Our study uses computational methods to investigate how dimensions we refer to as audience and channel of social media platforms influence emotion and topics in content posted by “alternative influencers” on different platforms. Using COVID-19 as an example, we find that alternative influencers’ content contained more anger and fear words on Facebook and Twitter compared to YouTube. We also found that these actors discussed substantively different topics in their COVID-19 content on YouTube compared to Twitter and Facebook. With these findings, we discuss how the audience and channel of different social media platforms affect alternative influencers’ ability to spread misinformation online. 
    more » « less