Abstract
Can one hear the ‘sound’ of a growing network? We address the problem of recognizing the topology of evolving biological or social networks. Starting from percolation theory, we analytically prove a linear inverse relationship between two simple graph parameters—the logarithm of the average cluster size and logarithm of the ratio of the edges of the graph to the theoretically maximum number of edges for that graph—that holds for all growing power law graphs. The result establishes a novel property of evolving power-law networks in the asymptotic limit of network size. Numerical simulations as well as fitting to real-world citation co-authorship networks demonstrate that the result holds for networks of finite sizes, and provides a convenient measure of the extent to which an evolving family of networks belongs to the same power-law class.
Mark Kac posed the question “Can one hear the shape of a drum?” (1), asking whether one could infer the geometric shape of a drum from its vibrational modes. Here we explore a similar question about the evolution of complex networks. The structure and properties of complex networks (2–4), such as the Internet (5, 6), citation networks (7), biological interaction network such as metabolic (8), protein-interaction networks (9), and disease-gene network (10) have been investigated intensively over the past decade. Many of these real world networks belong to the power-law family where the number of nodes with k neighbors, N(k), scales as a power law of the form N(k)~k-β where β is the scaling coefficient (11). Several mechanisms by which power law networks arise through evolutionary mechanisms were proposed by considering the structure of the real-world network as a target and by constructing models of network growth that produce such a network given an arbitrary seed graph as the initial condition (12–15). An important problem that arises in evaluating models of network evolution and in testing these against real data is the need to definitively characterize experimentally determined networks as belonging to the power-law family (16). Here we consider a related problem—given a time-series of network structure data, can one deduce whether the collection of networks belongs to the same power-law class?
In percolation theory (17), the vertices of a lattice are “occupied” or connected with a certain probability p, and the emergence of connected clusters are studied as a function of p. Of interest is the existence of a path that would allow a liquid, as it were, to percolate from the upper surface of the lattice to its lower surface. Many of the canonical results in percolation theory are in the form of power laws W ∝ (p - pc)k where pc is a constant value denoting the critical probability at which a phase transition occurs, and W is a parameter, such as the average cluster size being studied. There is no equivalent theoretical framework for evolving real-world networks that belong to a power-law family.
Consider a graph with N nodes and E edges. If N(k) is the number of nodes of degree k, the associated probability distribution of nodes is given by where K is the maximum degree. The average degree of the graph The fraction of edges, F, relative to the number of all possible edges in the complete graph on N nodes, is We define Wk the likelihood that an edge chosen at random is connected to a node of degreed, as k, as
Using a term from percolation theory we define the average cluster size The generalized moments λn,m are given by the expression Note that λ1,1 = 〈k〉. For networks where K is large, it can be proved (see analytical derivation in Methods).
The analysis above demonstrates that for power-law networks that belong to the same class W scales as 1/F in the limit of large N. We now demonstrate by means of simulations that this scaling law holds for power-law networks of finite size (Fig 1A) but not to Erdös-Renyi random networks (Fig 1B). The scaling of W as a function of F can be used to discriminate sensitively between types of networks produced by previously described network growth models by full or partial node duplication (15). The graphs in Figs 2A and 2B correspond to networks constructed according to a partial duplication model, which produces power-law networks for values of (7, 14, 15, 18). As p approaches 1, the degree distribution begins to deviate from a power-law distribution. This is shown by the graphs in Figs 2C and 2D, which show the progressive deviation from the power-law as reflected both in the graphs of N(k) vs. k and W vs. F on the log-log scale.
We have tested for the scaling law described above in data from the Citeseer database (19, 20). For each year between 1991 and 1999, we extracted the list of papers published and the number of citations to each paper and plotted the degree distribution for citation network for each year (Fig 3A). The degree distributions appear to fit poorly a power-law. In Fig 3B we plot W as a function of F for these 9 networks, and on the log-log scale W is approximately a linear function of F with a slope of -0.86 (close to -1 as predicted by our analysis).
The result described above for the average cluster size W varying as an inverse function of F does not discriminate among different types of power-law networks—it identifies growing networks in this broad class. Here we only study the behavior of the moment λ1,2 as a function of the fraction of possible edges F. The moments that we define in equation 2 are analogous to the moments of a probability distribution, and may serve as a rigorous method for characterization of the structure of complex networks and their modes of evolution. This simple scaling law should aid diverse purposes where network based simulation is important, such as finding patterns in social networks and for epidemiological modeling to optimize immunization strategies. Moreover, modularization of large and complex networks into component subgraphs, each with a power-law topology, might also be possible.
Methods
Analytical Derivation: For networks where the maximum degree, K is large, we can approximate the sum in the definition of average cluster size W by an integral. Then, for The normalization constant, C, depends on β and K and is given by . Hence, for large K and β > 1, the limiting value of C is C̃ = β–1. Substituting in the expression for generalized moment (see text), we obtain Using the normalized value of C we can rewrite the above as which simplifies in the limit of large K to
The relationship between the quantities W and F is obtained from the definition of and In the case of large networks or in the limit as K → ∞ we obtain,
Acknowledgements
This work is supported by EAGR-0941078 (NSF), FIBR-0527023 (NSF), and 1R01GM084881-01 (NIH) grants to AR.