Assessing the Exceptionality of Coloured Motifs in Networks
1 Institut National de la Recherche Agronomique (INRA), UR1077, Unité Mathématique, Informatique et Génome, 78352 Jouy-en-Josas, France
2 Centre for Genomic Regulation (CRG), Genome Bioinformatics Group, Universitat Pompeu Fabra, Dr. Aiguader 88, 08003 Barcelona, Spain
3 Université de Lyon, 69000 Lyon, France
4 Laboratoire de Biométrie et Biologie Évolutive, Université Claude Bernard Lyon 1, CNRS/UMR 5558, 69622 Villeurbanne, France
5 Projet BAMBOO, Institut National de Recherche Informatique et en Automatique (INRIA) Rhône-Alpes, 655 avenue de l'Europe, 38330 Montbonnot Saint-Martin, France
EURASIP Journal on Bioinformatics and Systems Biology 2009, 2009:616234 doi:10.1155/2009/616234Published: 26 October 2008
Various methods have been recently employed to characterise the structure of biological networks. In particular, the concept of network motif and the related one of coloured motif have proven useful to model the notion of a functional/evolutionary building block. However, algorithms that enumerate all the motifs of a network may produce a very large output, and methods to decide which motifs should be selected for downstream analysis are needed. A widely used method is to assess if the motif is exceptional, that is, over- or under-represented with respect to a null hypothesis. Much effort has been put in the last thirty years to derive -values for the frequencies of topological motifs, that is, fixed subgraphs. They rely either on (compound) Poisson and Gaussian approximations for the motif count distribution in Erdös-Rényi random graphs or on simulations in other models. We focus on a different definition of graph motifs that corresponds to coloured motifs. A coloured motif is a connected subgraph with fixed vertex colours but unspecified topology. Our work is the first analytical attempt to assess the exceptionality of coloured motifs in networks without any simulation. We first establish analytical formulae for the mean and the variance of the count of a coloured motif in an Erdös-Rényi random graph model. Using simulations under this model, we further show that a Pólya-Aeppli distribution better approximates the distribution of the motif count compared to Gaussian or Poisson distributions. The Pólya-Aeppli distribution, and more generally the compound Poisson distributions, are indeed well designed to model counts of clumping events. Altogether, these results enable to derive a -value for a coloured motif, without spending time on simulations.