TITLE: RANDOM WALK NODE EMBEDDINGS FOR NETWORK COMMUNITY DETECTION:
THEORY AND ALGORITHMS
ABSTRACT: There has been an explosion of network data in the physical, chemical, biological, computational, and social sciences in
the last few decades. Node embeddings, i.e., Euclidean-space representations of nodes in a network, make it possible
to apply to network data, tools and algorithms from multivariate statistics and machine learning that were developed for
Euclidean-space data. Random Walk Embeddings are a class of recently developed node embedding techniques where the vector representations are learned from proximal node co-occurrences in sampled random walks on the network. They have been applied to a number of supervised learning problems such as link prediction and node classification and have demonstrated state-of-the-art performance. Yet, random walk embeddings remain poorly understood.
This doctoral research studies random walk based node embeddings within the context of the community detection
problem, an unsupervised network learning task in which the goal is to partition a network into sub-networks (communities) of nodes that have a similar pattern of connectivity. The broad goals of this doctoral research are to (i) develop and experimentally validate VEC, a random walk based node embedding algorithm for the community detection problem under
the Stochastic Block Model (SBM) and its generalizations and (ii) develop a mathematical framework to theoretically analyze the properties of random walk node embeddings and use it to characterize the large-network asymptotic performance limits of community detection based on these embeddings.
Extensive experiments on real world networks and SBM random networks are presented to demonstrate the superior
accuracy and robustness of VEC over competing algorithms. Empirical results suggest that VEC can attain the informationtheoretic limits for exact and weak community recovery for SBM networks. We outline preliminary work and a multi-step analysis strategy for establishing concentration and sample complexity bounds for the embedding vectors generated by VEC. This requires a deeper examination of the underlying optimization problem, which is non-convex and dependent on the sampled random walks. As a first step, we have developed a nuclear norm constrained convex relaxation of the original problem. Work is proposed to establish concentration bounds for the convex objective function and its optimal solution and to conduct experiments to support and validate our analysis.
COMMITTEE: CO-ADVISORS
Prakash Ishwar, SE/ECE/CISE and Daniel Lewis Sussman, Math/Stat/CISE; Eric Kolaczyk,SE/Math/Stat/CISE; Lorenzo Orecchia, CS/CISE |