Detecting hidden communities in networks is an important problem. A community refers to a group of related nodes. For instance, in a social network, it can represent individuals with shared interests or beliefs; in a gene network, it can represent genes with common regulatory mechanisms, and so on. Most previous approaches assume non-overlapping communities where a node can belong to at most one community. In contrast, we provide a guaranteed approach for detecting overlapping communities, when the network is generated from a class of probabilistic mixed membership models. Our approach is based on fast and scalable tensor decompositions and linear algebraic operations. We provide guaranteed recovery of community memberships and carry out a finite sample analysis of our algorithm. Additionally, our results match the lower bounds onscaling requirements (up to poly-log factors) in the special case of the stochastic block model (with non-overlapping communities).
We have deployed the algorithm on GPUs, and our code design involves a careful optimization of GPU-CPU storage and communication. Our method is extremely fast and accurate on real datasets consisting of facebook network (about 20,000 nodes), yelp reviews (about 40,000 nodes) and dblp co-authorship network (about 120,000 nodes). For instance, on dblp dataset, our method takes under 2 hours to run to convergence. Thus, our approach is fast, scalable and accurate for detecting overlapping communities.