Paraclique Analysis
Clique and Paraclique
From the 10431 filtered probes we constructed an unweighted graph in which vertices
represent probes and edges are present whenever probes are correlated at or above
some threshold. Edges are otherwise absent. The selection of an appropriate threshold
is discussed below. We retained only positive correlations.
A clique is a set of vertices in which every vertex has an edge to every other vertex
in the set. A maximum clique is the largest clique in a given graph. Maximum clique
is arguably the most “natural” clustering method for unweighted graphs, but the
problem of finding the maximum clique in a graph is NP-complete; it is considered
computationally intractable. But for many types of graphs, most notably sparse,
irregular graphs, finding the maximum clique can be accomplished in reasonable time
using advanced algorithms. We are thus able to use it as the basis for clustering.
Clique, however, is widely considered too restrictive. Noise in experimental data
can lead to missing edges that should be present. We therefore use a relaxation
of clique called “paraclique”. A paraclique consists of the maximum clique and all
vertices with at least some proportion of edges to the maximum clique. The proportion
is called the proportional glom factor. Thus, the number of edges that must
be present in order for a vertex to be included in a paraclique is scaled to the
size of the core clique.
Figure 1. The second largest DBA/2J paraclique signature.
Paracliques in graphs constructed from time series data have the property that most
genes in the paraclique are very highly correlated across time with most other genes
in the paraclique, which suggests coregulation over time in the developing cerebellum.
We can derive a time profile “signature” of paracliques by averaging the expression
level of each gene in the paraclique at each time point. (See Figure 1.) All
probes in the paraclique are highly correlated with this averaged signature.
Over all 252 C57BL/6J paracliques, the average correlation of a gene with its paraclique
signature is 0.951; the lowest correlation of a gene with its respective paraclique
signature is 0.812. A paraclique’s signature is thus highly representative
of the time profile of the underlying genes.
Maximal Clique Enumeration
A maximal clique is a clique to which no vertex can be added to form a larger clique.
Note that this differs from a maximum clique. A clique is maximal if it is not contained
within any other clique; a clique is a maximum clique if no larger clique exists
in the graph. Maximal clique enumeration consists of listing all the maximal cliques
in a graph. Our current methods for enumerating maximal cliques make use of the
Bron Kerbosch algorithm and can be found in [2]. Because of the extent of overlap
between maximal cliques, graphs with several thousand vertices often contain a nearly
overwhelming number of maximal cliques. For instance, even at a relatively high
threshold of 0.9, the DBA/2J graph contained 211 million maximal cliques. The number
of maximal cliques to which a gene belongs is a measure of the interconnectivity
of the gene in the graph. Genes that are members of many maximal cliques may therefore
be key participants in several different biological networks. Finding the number
of maximal cliques to which each gene belongs produces a list of genes ranked by
interconnectedness in the graph.
Shifted Analysis
Since coregulation takes place over time, investigating the pairwise correlations
when one of each time profile is shifted by one or more time points can be revealing.
We therefore reran the analysis using correlations from such shifts. A downside
of this approach is that each one-time-point shift results in the loss of one data
point in common between the two correlation vectors and thus the statistical power
of the subsequent correlation is reduced.
GO enrichment
We performed Gene Ontology (GO) category enrichment analysis using the hypergeometric
test on the paracliques. Any category with a p-value ≤ .01 was retained.