Center for Applied Mathematics
Friday, October 19, 2012
Graph Structure Large and Small: Graphs at Facebook
With nearly a billion active users, the Facebook social graph at once promises the opportunity to resolve some of the most data-demanding questions in social science, yet simultaneously rebukes analyses employing all but the most efficient algorithms. This talk will discuss two recent results from both sides of this "big data" opportunity/challenge. The first contribution presents a study of the growth of Facebook as a rare social contagion process with genuinely global adoption, where we observe contagion that departs notably from traditional models of contagion based on analogies from epidemiology. In particular, where the probability of infection typically grows monotonically with the number of infected neighbors, we instead observe that the probability of contagion is tightly controlled by the number of connected components in an individual's contact neighborhood, rather than by the actual size of the neighborhood.
For the second contribution, graph partitioning is a classic challenge for any computation that involves distributing a graph across disks, machines, or data centers. Graph partitioning has a very rich literature, but existing algorithms typically cannot scale to billions of edges, or cannot provide guarantees about partition sizes. We introduce a novel algorithm, balanced label propagation, combining the computational efficiency of label propagation — where nodes are iteratively relabeled to the same "label" as the plurality of their graph neighbors — with the guarantees of constrained optimization — guiding the propagation by a linear program constraining the partition sizes. We evaluate our algorithm for its partitioning performance on the Facebook social graph, beyond the reach of other algorithms, and its impact when partitioning Facebook's "People You May Know" realtime graph prediction service.
Together, these contributions are intended to demonstrate both the opportunities and challenges posed by operationalizing and analyzing graph computations at very large scales.
The first part of the talk describes joint work with Lars Backstrom (Facebook), Cameron Marlow (Facebook), and Jon Kleinberg (Cornell); the second part describes joint work with Lars Backstrom.