AACR Education Book 2005:12-16 (2005)
© 2005 American Association for Cancer Research
Cytoscape
A Software Environment for Integrated Models of Biomolecular Interaction Networks
Rowan Christmas1,
Iliana Avila-Campillo1,
Hamid Bolouri1,
Benno Schwikowski1,
Mark Anderson2,
Ryan Kelley2,
Nerius Landys2,
Chris Workman2,
Trey Ideker2,
Ethan Cerami3,
Rob Sheridan3,
Gary D. Bader3 and
Chris Sander3
1 Institute for Systems Biology, Seattle, Washington
2 UCSD Department of Bioengineering, La Jolla, California
3 Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY
 |
Background
|
|---|
Molecular interaction and similarity networks are vital for understanding gene function in biology. How does a gene of unknown function relate to genes of known function? Gene function can be defined by studying related genes. Gene-gene relationships can be defined evolutionarily, such as by sequence similarity, or experimentally, such as by co-expression, co-location and cellular molecular interactions. Genes of unknown function are likely to participate in the same biological processes or have the same molecular function as the genes of known function they are linked to. This network data is accumulating at an increasing pace through functional genomics, proteomics and bioinformatics development of experimental methods to detect protein-protein interactions, genetic pathways, metabolic networks, cell signals, gene regulations, similarity relationships (e.g. sequence and co-expression) and literature-based links (1). Integration, visualization and querying software is now required to effectively make use of this information to answer scientific questions.
Cytoscape has been designed to fill this need as an open source software platform for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework (Figure 1). Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is programmatically extensible through a modular plugin architecture, allowing rapid development of additional features and computational analyses. Cytoscape provides a powerful visualization engine as well as a number of plugins for network data retrieval, network-based gene expression analysis, network clustering and network homology detection. Cytoscape version 2.0 runs on all major operating systems and is freely available for download from http://www.cytoscape.org/ as an open source Java application.

View larger version (38K):
[in this window]
[in a new window]
|
Fig. 1. Cytoscape version 2.0, available for free download from http://www.cytoscape.org. This screenshot shows a part of the yeast galactose metabolism network composed of protein-protein (lines) and protein-DNA (lines with arrows) interactions automatically laid out and visualized by Cytoscape. Expression data is overlaid as a black to white continuous node color gradient where black signifies low and white signifies high expression ratio between two experimental conditions. Users can navigate large networks using the "birds eye view" navigator shown at the lower left, can manage multiple networks using the network manager in the upper left panel and select visualization and analysis options using the toolbar at the top. More information about each protein and interaction in this network can be obtained using "right-click" (or context sensitive) menus when nodes or edges are selected. See the online Cytoscape manual for more information.
|
|
 |
Discussion
|
|---|
The central organizing metaphor of Cytoscape is a network graph, with molecular components (different molecular species) represented as nodes and intermolecular interactions represented as links (i.e., edges) between nodes, although the user is free to map their data to this network in other ways. The core platform provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other cellular state or molecular profiling data; and to link the network to functional annotations.
Cytoscape supports a variety of automated network layout algorithms to effectively visualize molecular networks. To reduce network complexity, it is often desirable to select subsets of nodes and edges for display according to a variety of criteria, e.g., selection by name, by a list of names, or based on a particular data attribute (see below). More complex queries are supported by a Filtering Toolbox that includes a Minimum Neighbors filter, which selects nodes having a minimum number of neighbors within a specified distance in the network; a Local Distance filter, which selects nodes within a specified distance of a group of pre-selected nodes; and a Combination filter, which selects nodes by AND/OR combinations of other filters.
Dynamic molecular state information, for example from gene or protein expression profiles, are integrated with the network model by associating data attributes with different nodes and edges. A graphical browser allows the user to examine all attributes on the currently selected nodes and edges. Cytoscape enables the visual appearance of nodes and edges to be controlled by user-defined data attributes to visual attribute mappings. A wide variety of visual properties are supported, including node color, shape and size; node border color and thickness; and edge color, thickness, and style; a data attribute is mapped to a visual property using either a lookup table or interpolation, depending on whether the attribute is discrete-valued or continuous. For instance, a user can load a protein-protein interaction network and a gene expression data set and color the proteins by gene expression values in a continuous manner from green, signifying underexpressed compared to a control experiment, to red, signifying overexpressed. This shows the gene expression data in the biological context of the interaction network, where it can be examined for patterns that are not obvious in each individual dataset.
Static information about network components, such as protein functional ontologies, are supported using annotations. In contrast to a data attribute, which is a single property of a node or edge, an annotation represents a hierarchical classification (i.e., an ontology, formally a directed acyclic graph) of progressively more specific descriptions of groups of nodes or edges. Annotations typically correspond to an existing repository of knowledge, such as the Gene Ontology database (2). Cytoscape integrates annotations with the network by transferring the desired levels of annotation onto node or edge attributes. Using the Annotation Controller, it is possible to have many levels of annotation all active and on display at the same time, each as a different attribute on the nodes or edges of interest.
Finally, the Cytoscape core is programmatically extensible through a modular plugin architecture. Plugins provide a powerful means of extending the core to implement network simulations, statistical analyses, algorithms, and/or biological semantics. Although the core is open source and it is encouraged that plugins be made freely available to the Cytoscape community, plugins are separable software that may be protected under any desired software license.
Cytoscape plugins currently available for version 2.0, unless otherwise noted, include:
Network analysis plugins:
- BioModules detects network modules, loose associations of preferred molecular interaction partners that perform a collective function. Expression patterns of identified modules and functional annotation is integrated and graphically visualized (3). (Available for Cytoscape 1.1, coming soon to Cytoscape 2.0)
- MCODE finds clusters (highly interconnected regions) in any network loaded into Cytoscape. Depending on the type of network, clusters may mean different things. For instance, clusters in a protein-protein interaction network have been shown to be protein complexes and parts of pathways. Clusters in a protein similarity network represent protein families (4).
- Motif Finder uses a Gibbs sampler to find protein sequence motifs that may be involved in protein-protein interaction sites based on known protein-protein interaction networks (5).
- PathBLAST automates the process of aligning two protein-protein interaction networks and mining for evolutionarily conserved pathways (6). (Available for Cytoscape 1.1, coming soon to Cytoscape 2.0)
Expression Data Analysis
- Active Modules enables Cytoscape to find regions of a molecular interaction network that are significantly coordinately expressed across multiple experimental conditions (7). Active modules can be used to find pathways or pathway components that are differentially active across multiple gene expression conditions.
- Activity Centers finds genes or proteins in a molecular interaction network with interaction partners that are significantly transcriptionally active between two conditions, e.g. tumor vs. normal tissue. Activity centers can be used to find genes or proteins that are important regulatory controllers of biological processes that are differentially active in a condition of interest (8). (Coming soon to Cytoscape 2.0)
- Data matrix provides a number of integrated tools for exploring and visualizing experimental data, such as gene expression datasets, in association with the Cytoscape network view. It is designed to display large numbers of datasets efficiently, in a spreadsheet style interface.
Data Retrieval
- The cPath plugin enables Cytoscape users to query, retrieve and visualize interactions retrieved from the cPath database (http://www.cbio.mskcc.org/cpath). The public cPath instance, or a local installation, can be queried for protein-protein interactions from major protein interaction databases.
- HTTP data fetcher allows Cytoscape to dynamically retrieve remote biological information for selected nodes in the current network.
- Oracle Spatial Network Data Model Plugin enables Cytoscape users to visualize and analyze network data stored in Oracle Spatial Network Data Model, which is a separately installed component of the Oracle database system.
- The PSI-MI plugin allows import and export of protein-protein interaction data made available in the PSI-MI XML standard (9).
- The SOFT plugin allows import of gene expression data in the SOFT formar from NCBI's Gene Expression Omnibus database (10).
 |
Future Directions
|
|---|
Cytoscape is constantly growing, both in developer community and in core feature set (www.cytoscape.org/roadmap.php). Cytoscape grew out of work by Trey Ideker, while in Lee Hood's group, and Benno Schwikowski's group at the Institute for Systems Biology in 2001. Chris Sander's group from Memorial Sloan-Kettering Cancer Center joined the collaboration in late 2002. Recently, Annette Adler's research group at Agilent has joined the collaboration. Most new biologically relevant software features will be available as plugins, so that Cytoscape can be tailored to specific uses and so that it does not try to be all things to all users, which generally leads to complicated, hard to use software. Clearly, future Cytoscape development must be directed by functional genomics and systems biology trends so that it can integrate new types of experimental data as they are made available. Towards detailed and seamless visualization and analysis of pathway data is one such direction. Currently, Cytoscape models all biological relationship as pairwise links, although, hierarchical and set relationships must be included to properly visualize gene regulatory, metabolic, genetic and signal transduction pathways (11). The recent discovery of inherent higher-order modular architecture in cellular networks (12) will hopefully allow complexity reduction in visualization and analysis systems such as Cytoscape. Features to support visualizing and analyzing the entire molecular interaction network represented using these higher order modules are already under development. Additionally, Cytoscape must more closely integrate with dynamic chemical reaction simulation systems to bridge the gap between Cytoscape's current high-level modeling and more detailed low level-modeling pathway simulation systems (13). Ultimately, as high-throughput experimental systems become more commonplace in the average biology lab, the average biologist will be forced to use Cytoscape-like software systems to integrate local data with existing knowledge to effectively make use of large-scale datasets to answer specific scientific questions of interest. Thus, Cytoscape must become more user-friendly and must be able to seamlessly integrate large amounts of a wide range of data types for subsequent analysis for it to truly become a useful systems biology tool. All of these important directions are being actively followed by the Cytoscape development team, but as an open-source project, biologists are empowered to hire their own programmers and leverage the Cytoscape platform to further their own research.
 |
References
|
|---|
- Bader GD, Heilbut A, Andrews B, Tyers M, Hughes T, Boone C. Functional genomics and proteomics: charting a multidimensional map of the yeast cell. Trends Cell Biol 2003a;13(7):34456.[Medline]
- Ashburner M, Ball CA, Blake JA et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000;25(1):259.[Medline]
- Prinz S, Avila-Campillo I, Aldridge C et al. Control of yeast filamentous-form growth by modules in an integrated molecular network. Genome Res 2004;14(3):38090.[Abstract/Free Full Text]
- Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC.Bioinformatics 2003b;4(1):2.
- Reiss DJ, Schwikowski B. Predicting protein-peptide interactions via a network-based motif sampler. Bioinformatics 2004;20 Suppl 1:I274I82.[Medline]
- Kelley BP, Sharan R, Karp RM et al. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A 2003;100(20):113949.[Abstract/Free Full Text]
- Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. 2002;18 Suppl 1:S233S40.
- Pradines J, Rudolph-Owen L, Hunter J et al. Detection of activity centers in cellular pathways using transcript profiling. J Biopharm Stat 2004;14(3):70121.[Medline]
- Hermjakob H, Montecchi-Palazzi L, Bader G et al. The HUPO PSI's molecular interaction formata community standard for the representation of protein interaction data. Nat Biotechnol 2004;22(2):17783.[Medline]
- Barrett T, Suzek TO, Troup DB et al. NCBI GEO: mining millions of expression profilesdatabase and tools. Nucleic Acids Res 2005;33 Database Issue:D5626.[Abstract/Free Full Text]
- Demir E, Babur O, Dogrusoz U et al. PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways. 2002;18(7):9961003.
- Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science 2002;298(5594):8247.[Abstract/Free Full Text]
- Ideker T, Lauffenburger DA. Building with a scaffold: emerging strategies for high- to low-level cellular modeling. Trends in Biotechnology 2003;21(6):25562.[Medline]