FunNet is an integrative functional genomics tool created for the analysis of transcriptional interaction networks (i.e. known also as co-expression networks). To improve the biological relevance of the co-expression modules identified in these networks, FunNet integrates experimental gene expression data and knowledge about transcripts biological roles, available in genomic annotation systems. To start off, FunNet performs a functional profiling of gene expression data to identify a set of significant (i.e. overrepresented) biological themes characterizing the analyzed transcripts (i.e. cellular processes, pathways, molecular functions in which the analyzed transcripts are involved). Then, based on the results of the functional analysis, a two-layer abstraction model is built to integrate the two types of transcriptional information: expression levels and transcripts’ biological roles. This model is further used to derive a measure of proximity between significant biological themes based on the similarity of the expression profiles of their annotated transcripts. In the end, themes demonstrating a significant relationship in the transcriptional expression space are associated to build transcriptional modules. FunNet relies on genomic annotations provided by the Gene Ontology Consortium (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) . For further details on the computational approaches implemented in FunNet, please see the references indicated on the FunNet website1.
The analytical options provided by the FunNet web interface are organized in three main panels titled: "Situation", "Data" and "Options". A "Summary" tab allows checking the uploaded data files and selected analytical options before submitting them. We will detail these options in order of their selection.
The first option concerns the genome to which the analyzed gene expression data refers. Seven genomes are currently available in FunNet: Homo Sapiens, Mus Musculus, Rattus Norvegicus, Saccharomyces Cerevisiae, Gallus gallus, Danio rerio and Arabidopsis Thaliana. Future ones could be added if required by the users.
FunNet was designed to accommodate two types of analytical situations:
- The analysis of a single set of gene expression profiles
- The simultaneous analysis of two sets of gene expression profiles distinguished by their global regulation pattern (i.e. up- versus down-regulated gene expression profiles in between two experimental situations).
The "One set" option (Fig. 1) allows for great flexibility in defining transcripts' selection criteria and may be used to analyze time series gene expression data. The "Two sets" option considers, as previously indicated, two sets of differentially regulated transcripts, selected through a preliminary differential expression assessment. In this case a "Discriminant annotation" option allows for a closer comparison of the functional profiles of two lists of transcripts (i.e. by computing the significance of the overrepresentation of annotating themes while using the ensemble of the transcripts that make up the two lists as a reference).
The first block of options of the second tab allows one to upload expression data files and select the type of analysis to be performed (Fig. 2). Depending on the analytical situation selected previously (see the previous section), FunNet will require one or two files containing the gene expression profiles. In addition to these files, a third optional file containing the list of transcripts to be considered as a reference can also be uploaded. The main purpose of this last file is to allow one to restrict the computation of the annotating themes' overrepresentation (i.e. their gene enrichment) in relation to a reference list of transcripts specified by the user. Such a user-specified reference allows one to address those situations in which the expression data was obtained through microarrays with custom designs, as opposed to more common situations in which pangenomic arrays were used. At the bare minimum, the reference list should include all transcripts whose expression profiles are to be analyzed. If no reference list is uploaded, the whole list of genes which constitutes the selected genome is used as a reference for gene enrichment computations.
Regardless of the analytical situation selected, the data files should all be formatted as text files in which the "TAB" sequence is used to separate distinct columns. These files are easy to generate by using the "File → Save as…" option provided in most popular datasheet analysis programs (i.e. Microsoft® Excel, Sun® OpenOffice, etc.). A test dataset, demonstrating the files’ format, can be downloaded here.
The file format required for the expression data is illustrated in Fig. 3 (left). The first column of these files should contain the EntrezGene IDs of the transcripts to be analyzed (one transcript per line), while the subsequent columns allow one to specify their expression measurements (i.e. C1, C2,…). NCBI's EntrezGene system is currently the only gene identification system supported by FunNet tool, as it allows an unambiguous identification of transcripts required for their automated functional annotation, as well as for relating them to other gene centered information resources. No header (i.e. specifying column names) should be included in these files, only numeric data. Also only periods (".") should be used to separate decimals from the integer parts of the expression values.
The file format required for the optional reference list is illustrated in Fig. 3 (right). This file should also be formatted as a "TAB" separated text file, including a single column which contains the GeneIDs of the ensemble of transcripts considered as a reference for gene enrichment computations (i.e. the ensemble of transcripts which are spotted on the microarray, or those that are retained after filtering the raw data for missing values, etc.).
The second block of options included in this tab (Fig. 4) allows one to select among the three types of analyses implemented in the FunNet tool:
The "Conventional functional analysis" option allows one to perform a conventional functional analysis of the uploaded transcriptional profiles in a manner similar to many other tools available online or as individual software packages. The main purpose of this option is to provide an illustration of the functional profiles of the analyzed transcripts, to facilitate the selection of the annotation related options required by FunNet's analysis of transcriptional interactions.
The "Functional analysis of transcriptional networks" represents FunNet's main analytical option, which concerns the integrative approach of transcriptional interactions analysis implemented in the tool. It performs both a functional profiling of the analyzed transcripts and an analysis of their interactions. The options available in relation to this analysis are detailed later in this document.
The "Co-expression threshold estimation" represents a secondary option related to the analysis of transcriptional interactions. It allows one to estimate a threshold of co-expression significance by adjusting the gene expression data to a theoretical model of scale-free network topology, based on a proposal made by [Zhang and Horvath 2005]. This threshold represents an option required by the analysis of transcriptional interactions and is used to perform either a discrete2 (i.e. "hard") or a continuous3 (i.e. "soft") thresholding of the raw co-expression matrix computed from gene expression data, to derive a transcriptional adjacency matrix4. For further details regarding the estimation of these thresholds and their use please refer to the indicated manuscript [Zhang and Horvath 2005]. As the automated selection of these thresholds is not always possible, their estimation is implemented separately as a secondary option. It should be underlined that, although a rigorous estimation of such a co-expression threshold may be useful in a number of situations, its interpretation remains sometimes difficult. The user may avoid such a preliminary estimation and instead use default values based on generic indications available in literature. We will discuss these options in greater detail later in this document.
2A discrete thresholding implies that the links between nodes in the resulting network will be non weighted (i.e. they will be established to relate transcriptional nodes demonstrating a co-expression criterion superior to the chosen threshold).
3In this case the result will be a weighted network in which continuous intensity values will be associated to characterize the links between nodes (i.e. two transcripts will be considered as more or less strongly co-regulated, in opposition to the previous situation in which they were considered either as co-expressed or not).
4Such a matrix represents a “numerical illustration” of the transcriptional co-expression network.
The third tab provides various blocks of options depending on the selected type of analysis. We will describe them in relation to the three main types of analyses mentioned previously.
The options provided with this type of analysis are related to the gene annotation systems to be used for transcripts' functional profiling and to the type of gene enrichment computation performed for Gene Ontology categories (i.e. the three main conceptual axes of the Gene Ontology lattice).
The annotation systems currently available in FunNet are those provided by the Gene Ontology Consortium (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG). Detailed descriptions of these systems are provided at the URLs indicated previously (please see the introductory remarks).
The ontological structure of GO allows several options for computing the gene enrichment of its annotating categories. Three such options are currently implemented in FunNet (Fig. 5):
"Specificity", the default option, is related to a way of computing the gene enrichment which takes into account the degree of specificity (i.e. the conceptual precision) of the information provided by the GO themes with regard to the biological roles of their annotated transcripts. A brief description of this approach is provided hereafter (a more detailed one will be included in a manuscript which is currently in preparation). The idea is to consider the GO categories (i.e. called "level 1" themes) as a starting point in annotating transcripts' roles, as they depict the available biological information on transcripts roles in the most precise manner (i.e. those which are used to annotate transcripts' roles directly), regardless of the heterogeneity of their conceptual granularity. From these themes subsequent levels of decreasing informational specificity are derived based on the transitivity of the ontological lattice (i.e. which allows propagating transcripts' annotations from more specific themes to more conceptually general ones). In the end, a list of significantly enriched themes is computed for each level of informational specificity derived from the GO ontological lattice.
"Terminological" indicates a way of performing transcripts annotation which distinguishes the terminological levels of conceptual granularity implemented within the GO lattice (whose purpose is to assure a homogenous conceptual granularity of the annotating biological themes belonging to the same level of GO). In this case a list of significantly enriched themes is computed for each level of conceptual granularity derived from the ontological lattice.
"Decorrelated" refers to a decorrelated annotation approach, designed to minimize the conceptual redundancy affecting transcripts' annotation within the GO lattice. The method implemented in FunNet was inspired by a proposal originally made by [Alexa et al. 2006].
The statistical significance of the gene enrichment of the various annotating themes, identified through one of the three mentioned approaches, is assessed by using a Fisher's exact test based either on the reference list of transcripts specified by the user or, in its absence, on the whole list of the genes composing the selected reference genome. John Storey's (2003) false discovery rate (FDR) estimation approach is proposed as an optional approach [Storey and Tibshirani 2003] for controlling statistical errors resulting from multiple testing.
In case a preliminary estimation of a co-expression significance threshold is desired, the user has to select one of the three measures of transcriptional co-expression implemented in FunNet (Fig. 6). A non parametric Spearman's Rs is used by default to assess transcriptional co-expression.
The first block of options provided in this situation (Fig. 7) is related to the functional analysis of the transcriptional profiles and is very similar to the one described previously (see the paragraph "Conventional functional analysis"). The only difference relies on the possibility of specifying a terminological or specificity "level" to be used as reference for the integrated functional analysis of transcriptional interactions. By default the first level (i.e. the most specific one) is used to perform the analysis of transcriptional interactions.
The selection of a different level of informational specificity or terminological granularity may sometimes be justified by the biological significance of the annotating themes that belong to it (i.e. sometimes less-specific themes may capture the biological phenomena better than the more specific ones), and should always be based on the results of a preliminary conventional functional analysis (see the related paragraph).
The second block of options refers to the transcriptional co-expression analysis (Fig. 8). Besides selecting one of the three co-expression measures implemented in FunNet (see the "Co-expression threshold estimation" paragraph), the user is also required to select a type of co-expression threshold (i.e. discrete or continuous) and to indicate the value of the threshold to be used. If a specific estimation of the co-expression threshold is not considered as absolutely necessary, the user may rely on several general indications in selecting such thresholds. Thus for a discrete co-expression threshold, an absolute value of a correlation coefficient in between 0.80 – 0.85 should perform well in most conditions [Alloco et al. 2004]. In the same way, a γ parameter5 (i.e. required for a continuous thresholding) comprised in between 2 – 3 provides a good fit of the scale-free topological model in most "omic" networks [Barabasi and Oltvai 2004]. The user should be aware however that some particular situations are not well addressed by using these generic indications. Also in case a Euclidian measure of co-expression is selected, a preliminary estimation of the significant co-expression threshold is always necessary, as no general indications can be provided for this situation.
The last option provided in this block offers the possibility to use a topological measure of proximity between transcriptional nodes, computed from the raw co-expression network, following a suggestion made by [Zhang and Horvath 2005]. This measure assesses the proximity between each two transcriptional nodes based on the number of common neighbors in the co-expression network (i.e. the nodes which both transcripts are directly connected to; please see the indicated reference for further details). This option is provided in addition to the co-expression measures mentioned previously, and may be used either with a discrete or a continuous co-expression threshold.
5The scale-free topological model is described by a power law which expresses the probability P(k) to observe a node with the connectivity k in a transcriptional interaction network as: P(k) = k-γ
The final section allows the user to check the uploaded files and selected options before their submission (Fig. 9). If some options need to be changed the user may go back to the respective section by clicking on the name of the corresponding tab.
A last form (Fig. 10) requires the user to indicate an e-mail address before submitting the new analytical job. An additional field allows one to specify an optional name to identify the submission. If provided this name will be mentioned in the e-mails sent to the user in case of submission errors (mostly related to problems with the format of the files or the correspondence between the selected reference genome and the gene identifiers provided in the submitted gene expression data), as well as to communicate the results after the completion of the analysis.
Technical questions and other issues may be submitted by using an online form provided to this purpose.
When the analysis of the submitted data is completed the user receives an automatically generated e-mail containing an html link allowing one to download the results as a compressed ZIP archive. Fig. 11 illustrates the structure of the subfolders and the files contained in the ZIP archive.
Two subfolders group the results generated as HTML or image (PNG & PDF) files. These files contain detailed information regarding the functional profiles of the analyzed transcripts, as well as the interactional centrality of annotating themes (i.e. when an analysis of the transcriptional interactions has been selected in addition to the conventional functional profiling of the gene expression data).
Fig. 12 illustrates the KEGG categories composing the functional profile of an analyzed dataset. HTML links associated with the KEGG categories allow for an automated mapping of the analyzed transcripts into significantly enriched KEGG pathways (Fig. 13).
Graphical representations of the functional profiles, such as the one illustrated in Fig. 14, are also generated by visualizing significantly enriched biological themes, annotating up- or down-regulated transcripts, through bar plots depicting their coverage (expressed in %) of the analyzed transcriptional domain.
The main folder contain all the files needed to visualize interaction networks built by FunNet by using the Cytoscape software.
When used to analyze transcriptional interactions, FunNet generates, for each annotation system selected, a set of files similar to the one illustrated in Fig. 15. The content of these files is detailed in Table 1. All these files are formatted as TAB separated text files. They are designed to allow their import into the Cytoscape software to visualize the generated interaction networks.
As indicated in Table 1, FunNet's analysis generates both theme proximity networks and transcriptional co-expression networks. Fig. 16 illustrates the process of importing a theme proximity network into Cytoscape. After indicating the text file import options (i.e. the TAB separation of its columns), Cytoscape allows one to identify the nodes which compose the network (columns 1 and 3), the type of links relating the nodes (column 2) and their intensity (column 4 indicates a proximity value mapped in the interval [0,1]).
The semantics of the node connections should be interpreted either as proximity links (i.e. in theme networks where they indicate regulatory interactions in between various cellular processes or structures) or as co-expression links (i.e. in transcriptional interaction networks).
Once a network has been imported into Cytoscape, it is also possible to import nodes' attributes from additional files generated by FunNet (see Table 1 for the file content). Fig. 17 illustrates a file containing attributes in relation to a set of KEGG themes characterizing the functional profile of a dataset. Several types of information are provided, including the module to which each theme was associated (column "module"), the regulation pattern of its annotated transcripts (column "up(1)_down(0)" which indicates the up- or down-regulation of annotated transcripts by conventional values of 1 and 0 respectively), as well as several interactional centrality measures, including nodes degree and betweenness, expressed both as absolute values or normalized with regards to the whole network. Similar information is also provided for transcriptional networks (see Table 1 for the corresponding files).
The process of importing nodes' attributes from text files is illustrated in Fig. 18. The first column of the nodes' attributes file is used to map these attributes to the corresponding network nodes. Afterwards, the imported attributes may be used to illustrate various patterns within the network, such as nodes' regulation, their association within modules, and so on. For further details about how to use these attributes to illustrate various transcriptional patterns in Cytoscape please refer to the tutorial provided on the Cytoscape website.
Illustrations of network representations built from FunNet generated files for transcripts and their annotating themes are presented in Fig. 19 and Fig. 20. The themes' association to one of the two identified modules identified in this condition is indicated in Fig. 19 by using distinct shapes of the nodes belonging to each module, while their overall regulation pattern is illustrated by nodes' colors. The strength of the proximity between various themes in the analyzed context is marked by bold lines (indicating proximities stronger than the upper quartile of their distribution) or dashed lines (indicating proximities in between the median and the upper quartile of their distribution in the network).
Fig. 20 illustrates the transcriptional interaction network associated to the theme proximity network depicted in Fig. 19. The transcripts belonging to each of the two transcriptional modules and their regulation pattern is illustrated in the same way as for the theme proximity network. It should be recalled that FunNet generates two types of transcriptional interaction networks, one limited to those transcripts annotated by significantly overrepresented themes and the other considering all transcripts regardless of their annotation (see Table 1). In the last case transcripts that are not directly annotated by one of the significantly enriched themes are associated to the identified transcriptional modules through a procedure which is directly integrated into the network analysis performed by FunNet.
Fig. 21 depicts a theme centrality graph, corresponding to the proximity network presented in Fig. 19, in which significantly overrepresented themes are grouped to illustrate the identified transcriptional modules, while their interactional centrality within the network (expressed in percentages in regards to the whole network) is visualized through bar plots. These network centralities can be considered as an indication of the contextual relevance of the various themes and of the corresponding cellular processes, pathways, etc. Similar information is provided for individual transcripts in the attributes’ files mentioned previously (see Table 1).
ALEXA, A., RAHNENFUHRER, J. and LENGAUER, T. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22(13): 1600-1607.
ALLOCCO, D.J., KOHANE, I.S. and BUTTE, A.J. (2004). Quantifying the relationship between co-expression, co-regulation and gene function. BMC Bioinformatics 5(1): 18.
BARABASI, A.L. and OLTVAI, Z.N. (2004). Network biology: understanding the cell's functional organization. Nat Rev Genet 5(2): 101-113.
STOREY, J.D. and TIBSHIRANI, R. (2003). Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100(16): 9440-9445.
ZHANG, B. and HORVATH, S. (2005). A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4: Article17.