skip to main content

Search for: All records

Creators/Authors contains: "He, Yuntian"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available October 1, 2024
  2. Free, publicly-accessible full text available August 1, 2024
  3. As machine learning becomes more widely adopted across domains, it is critical that researchers and ML engineers think about the inherent biases in the data that may be perpetuated by the model. Recently, many studies have shown that such biases are also imbibed in Graph Neural Network (GNN) models if the input graph is biased, potentially to the disadvantage of underserved and underrepresented communities. In this work, we aim to mitigate the bias learned by GNNs by jointly optimizing two different loss functions: one for the task of link prediction and one for the task of demographic parity. We further implement three different techniques inspired by graph modification approaches: the Global Fairness Optimization (GFO), Constrained Fairness Optimization (CFO), and Fair Edge Weighting (FEW) models. These techniques mimic the effects of changing underlying graph structures within the GNN and offer a greater degree of interpretability over more integrated neural network methods. Our proposed models emulate microscopic or macroscopic edits to the input graph while training GNNs and learn node embeddings that are both accurate and fair under the context of link recommendations. We demonstrate the effectiveness of our approach on four real world datasets and show that we can improve the recommendation fairness by several factors at negligible cost to link prediction accuracy. 
    more » « less
  4. n recent years, we have seen the success of network representation learning (NRL) methods in diverse domains ranging from com- putational chemistry to drug discovery and from social network analysis to bioinformatics algorithms. However, each such NRL method is typically prototyped in a programming environment familiar to the developer. Moreover, such methods rarely scale out to large-scale networks or graphs. Such restrictions are problematic to domain scientists or end-users who want to scale a particular NRL method-of-interest on large graphs from their specific domain. In this work, we present a novel system, WebMILE to democ- ratize this process. WebMILE can scale an unsupervised network embedding method written in the user’s preferred programming language on large graphs. It provides an easy-to-use Graphical User Interface (GUI) for the end-user. The user provides the necessary in- put (embedding method file, graph, required packages information) through a simple GUI, and WebMILE executes the input network embedding method on the given input graph. WebMILE leverages a pioneering multi-level method, MILE (alternatively DistMILE if the user has access to a cluster), that can scale a network embed- ding method on large graphs. The language agnosticity is achieved through a simple Docker interface. In this demonstration, we will showcase how a domain scientist or end-user can utilize WebMILE to rapidly prototype and learn node embeddings of a large graph in a flexible and efficient manner - ensuring the twin goals of high productivity and high performance. 
    more » « less
  5. Darknet market forums are frequently used to exchange illegal goods and services between parties who use encryption to conceal their identities. The Tor network is used to host these markets, which guarantees additional anonymization from IP and location tracking, making it challenging to link across malicious users using multiple accounts (sybils). Additionally, users migrate to new forums when one is closed further increasing the difficulty of linking users across multiple forums. We develop a novel stylometry-based multitask learning approach for natural language and model interactions using graph embeddings to construct low-dimensional representations of short episodes of user activity for authorship attribution. We provide a comprehensive evaluation of our methods across four different darknet forums demonstrating its efficacy over the state-of-the-art, with a lift of up to 2.5X on Mean Retrieval Rank and 2X on Recall@10. 
    more » « less