Network representations have been shown to improve performance within a variety of tasks, including classification, clustering, and link prediction. However, most models either focus on moderate-sized, homogeneous networks or require a significant amount of auxiliary input to be provided by the user. Moreover, few works have studied network representations in real-world heterogeneous social networks with ambiguous social connections and are often incomplete. In the present work, we investigate the problem of learning low-dimensional node representations in heterogeneous professional social networks (HPSNs), which are incomplete and have ambiguous social connections. We present a general heterogeneous network representation learning model called Star2Vec that learns entity and person embeddings jointly using a social connection strength-aware biased random walk combined with a node-structure expansion function. Experiments on LinkedIn's Economic Graph and publicly available snapshots of Facebook's network show that Star2Vec outperforms existing methods on members' industry and social circle classification, skill and title clustering, and member-entity link predictions. We also conducted large-scale case studies to demonstrate practical applications of the Star2Vec embeddings trained on LinkedIn's Economic Graph such as next career move, alternative career suggestions, and general entity similarity searches.
more »
« less
Propinquity drives the emergence of network structure and density
The lack of large-scale, continuously evolving empirical data usually limits the study of networks to the analysis of snapshots in time. This approach has been used for verification of network evolution mechanisms, such as preferential attachment. However, these studies are mostly restricted to the analysis of the first links established by a new node in the network and typically ignore connections made after each node’s initial introduction. Here, we show that the subsequent actions of individuals, such as their second network link, are not random and can be decoupled from the mechanism behind the first network link. We show that this feature has strong influence on the network topology. Moreover, snapshots in time can now provide information on the mechanism used to establish the second connection. We interpret these empirical results by introducing the “propinquity model,” in which we control and vary the distance of the second link established by a new node and find that this can lead to networks with tunable density scaling, as found in real networks. Our work shows that sociologically meaningful mechanisms are influencing network evolution and provides indications of the importance of measuring the distance between successive connections.
more »
« less
- PAR ID:
- 10127205
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- Volume:
- 116
- Issue:
- 41
- ISSN:
- 0027-8424
- Page Range / eLocation ID:
- 20360 to 20365
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Larochelle, H. ; Ranzato, M. ; Hadsell, R. ; Balcan, M. F. ; Lin, H. (Ed.)We propose a novel learning framework based on neural mean-field dynamics for inference and estimation problems of diffusion on networks. Our new framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities, which renders a delay differential equation with memory integral approximated by learnable time convolution operators, resulting in a highly structured and interpretable RNN. Directly using cascade data, our framework can jointly learn the structure of the diffusion network and the evolution of infection probabilities, which are cornerstone to important downstream applications such as influence maximization. Connections between parameter learning and optimal control are also established. Empirical study shows that our approach is versatile and robust to variations of the underlying diffusion network models, and significantly outperform existing approaches in accuracy and efficiency on both synthetic and real-world data.more » « less
-
Abstract Network theory helps us understand, analyze, model, and design various complex systems. Complex networks encode the complex topology and structural interactions of various systems in nature. To mine the multiscale coupling, heterogeneity, and complexity of natural and technological systems, we need expressive and rigorous mathematical tools that can help us understand the growth, topology, dynamics, multiscale structures, and functionalities of complex networks and their interrelationships. Towards this end, we construct the node-based fractal dimension (NFD) and the node-based multifractal analysis (NMFA) framework to reveal the generating rules and quantify the scale-dependent topology and multifractal features of a dynamic complex network. We propose novel indicators for measuring the degree of complexity, heterogeneity, and asymmetry of network structures, as well as the structure distance between networks. This formalism provides new insights on learning the energy and phase transitions in the networked systems and can help us understand the multiple generating mechanisms governing the network evolution.more » « less
-
Understanding mechanisms driving link formation in dynamic social networks is a long-standing problem that has implications to understanding social structure as well as link prediction and recommendation. Social networks exhibit a high degree of transitivity, which explains the successes of common neighbor-based methods for link prediction. In this paper, we examine mechanisms behind link formation from the perspective of an ego node. We introduce the notion of personalized degree for each neighbor node of the ego, which is the number of other neighbors a particular neighbor is connected to. From empirical analyses on four on-line social network datasets, we find that neighbors with higher personalized degree are more likely to lead to new link formations when they serve as common neighbors with other nodes, both in undirected and directed settings. This is complementary to the finding of Adamic and Adar that neighbor nodes with higher (global) degree are less likely to lead to new link formations. Furthermore, on directed networks, we find that personalized out-degree has a stronger effect on link formation than personalized in-degree, whereas global in-degree has a stronger effect than global out-degree. We validate our empirical findings through several link recommendation experiments and observe that incorporating both personalized and global degree into link recommendation greatly improves accuracy.more » « less
-
null (Ed.)Network embedding has demonstrated effective empirical performance for various network mining tasks such as node classification, link prediction, clustering, and anomaly detection. However, most of these algorithms focus on the single-view network scenario. From a real-world perspective, one individual node can have different connectivity patterns in different networks. For example, one user can have different relationships on Twitter, Facebook, and LinkedIn due to varying user behaviors on different platforms. In this case, jointly considering the structural information from multiple platforms (i.e., multiple views) can potentially lead to more comprehensive node representations, and eliminate noises and bias from a single view. In this paper, we propose a view-adversarial framework to generate comprehensive and robust multi-view network representations named VANE, which is based on two adversarial games. The first adversarial game enhances the comprehensiveness of the node representation by discriminating the view information which is obtained from the subgraph induced by neighbors of that node. The second adversarial game improves the robustness of the node representation with the challenging of fake node representations from the generative adversarial net. We conduct extensive experiments on downstream tasks with real-world multi-view networks, which shows that our proposed VANE framework significantly outperforms other baseline methods.more » « less