Home > Analysis > Paraclique Analysis

Paraclique Analysis

Clique and Paraclique

From the 10431 filtered probes we constructed an unweighted graph in which vertices represent probes and edges are present whenever probes are correlated at or above some threshold. Edges are otherwise absent. The selection of an appropriate threshold is discussed below. We retained only positive correlations.

A clique is a set of vertices in which every vertex has an edge to every other vertex in the set. A maximum clique is the largest clique in a given graph. Maximum clique is arguably the most “natural” clustering method for unweighted graphs, but the problem of finding the maximum clique in a graph is NP-complete; it is considered computationally intractable. But for many types of graphs, most notably sparse, irregular graphs, finding the maximum clique can be accomplished in reasonable time using advanced algorithms. We are thus able to use it as the basis for clustering.

Clique, however, is widely considered too restrictive. Noise in experimental data can lead to missing edges that should be present. We therefore use a relaxation of clique called “paraclique”. A paraclique consists of the maximum clique and all vertices with at least some proportion of edges to the maximum clique. The proportion is called the proportional glom factor. Thus, the number of edges that must be present in order for a vertex to be included in a paraclique is scaled to the size of the core clique.

Figure 1.  The second largest DBA/2J paraclique signature.

Paracliques in graphs constructed from time series data have the property that most genes in the paraclique are very highly correlated across time with most other genes in the paraclique, which suggests coregulation over time in the developing cerebellum.

We can derive a time profile “signature” of paracliques by averaging the expression level of each gene in the paraclique at each time point. (See Figure 1.) All probes in the paraclique are highly correlated with this averaged signature.  Over all 252 C57BL/6J paracliques, the average correlation of a gene with its paraclique signature is 0.951; the lowest correlation of a gene with its respective paraclique signature is 0.812.  A paraclique’s signature is thus highly representative of the time profile of the underlying genes.

Maximal Clique Enumeration

A maximal clique is a clique to which no vertex can be added to form a larger clique. Note that this differs from a maximum clique. A clique is maximal if it is not contained within any other clique; a clique is a maximum clique if no larger clique exists in the graph. Maximal clique enumeration consists of listing all the maximal cliques in a graph. Our current methods for enumerating maximal cliques make use of the Bron Kerbosch algorithm and can be found in [2]. Because of the extent of overlap between maximal cliques, graphs with several thousand vertices often contain a nearly overwhelming number of maximal cliques. For instance, even at a relatively high threshold of 0.9, the DBA/2J graph contained 211 million maximal cliques. The number of maximal cliques to which a gene belongs is a measure of the interconnectivity of the gene in the graph. Genes that are members of many maximal cliques may therefore be key participants in several different biological networks. Finding the number of maximal cliques to which each gene belongs produces a list of genes ranked by interconnectedness in the graph.

Shifted Analysis

Since coregulation takes place over time, investigating the pairwise correlations when one of each time profile is shifted by one or more time points can be revealing. We therefore reran the analysis using correlations from such shifts. A downside of this approach is that each one-time-point shift results in the loss of one data point in common between the two correlation vectors and thus the statistical power of the subsequent correlation is reduced.

GO enrichment

We performed Gene Ontology (GO) category enrichment analysis using the hypergeometric test on the paracliques. Any category with a p-value ≤ .01 was retained.

Project funded by funded by NIH Grant HD052472. Please view our funding & data sharing policy
Contact Webmaster