Genevestigator Transcriptome Meta-Analysis and Biomarker Search Using Rice and Barley Gene Expression Databases
a Department of Biology, ETH Zurich, Universitaetstrasse 2, 8092 Zurich, Switzerland
b Institute of Theoretical Computer Science, ETH Zurich, Universitaetsstrasse 6, 8092 Zurich, Switzerland
1 To whom correspondence should be addressed. E-mail phz{at}ethz.ch
| Abstract |
|---|
|
|
|---|
The wide-spread use of microarray technologies to study plant transcriptomes has led to important discoveries and to an accumulation of profiling data covering a wide range of different tissues, developmental stages, perturbations, and genotypes. Querying a large number of microarray experiments can provide insights that cannot be gained by analyzing single experiments. However, such a meta-analysis poses significant challenges with respect to data comparability and normalization, systematic sample annotation, and analysis tools. Genevestigator addresses these issues using a large curated expression database and a set of specifically developed analysis tools that are accessible over the internet. This combination has already proven to be useful in the area of plant research based on a large set of Arabidopsis data (Grennan, 2006). Here, we present the release of the Genevestigator rice and barley gene expression databases that contain quality-controlled and well annotated microarray experiments using ontologies. The databases currently comprise experiments from pathology, plant nutrition, abiotic stress, hormone treatment, genotype, and spatial or temporal analysis, but are expected to cover a broad variety of research areas as more experimental data become available. The transcriptome meta-analysis of the model species rice and barley is expected to deliver results that can be used for functional genomics and biotechnological applications in cereals.
Received for publication July 6, 2008. Accepted for publication July 10, 2008.
| INTRODUCTION |
|---|
|
|
|---|
One of the major challenges in plant genomics is the translation of scientific results obtained from model species into crop plants that are relevant for agriculture and industry. While Arabidopsis thaliana, which is the most widely used model plant species (Meinke et al., 1998), has significantly contributed to fundamental research and advanced our understanding of plant biology, it is not an economically important plant. Nevertheless, results from Arabidopsis genomics research have remained an abundant source of information for translational biology applications into agriculturally important crop species (Rensink and Buell, 2004). For example, Arabidopsis genes can function ectopically in other plant species, and investigating agriculturally relevant processes such as biotic or abiotic stress in Arabidopsis opens opportunities to engineer genetically modified crops with pathways that confer tolerance or resistance (Zhang et al., 2004). Despite this progress, the underlying mechanisms of many traits are highly complex and not easily transferable from dicotyledonous to monocotyledonous species. The availability of genome and transcriptome data from monocotyledonous crop plants is therefore expected to facilitate the improvement of cereals. Rice and barley are two widely cultivated diploid crop species that are increasingly recognized as model plants for breeding, functional genomics, and, more recently, systems biology research (Izawa and Shimamoto, 1996; von Zychlinski et al., 2007).
Rice and barley have extensive genetic, molecular, and genomic resources (Bruskiewich et al., 2006; Sreenivasulu et al., 2008), and both species can be routinely genetically engineered using Agrobacterium-mediated (Murray et al., 2004; Nishimura et al., 2006) or biolistic (Christou, 1997; Obert et al., 2008) transformation. As the result of public and private efforts (Goff et al., 2002; IRGSP, 2005; Yu et al., 2002), the genome sequence of rice is now available for functional genomics applications. Similarly, the International Barley Sequencing Consortium (http://barleygenome.org) will establish a physical map of Hordeum vulgare and sequence the genome. The sequence conservation and synteny between rice and barley and other cereal crop species such as wheat, maize, sorghum, oats, and sugarcane provide important opportunities for translational biological applications (Salentijn et al., 2007) and candidate gene approaches (CGA) (Pflieger et al., 2001).
Several platforms for expression profiling analysis have become available for rice and barley, including cDNA–AFLP (Leymarie et al., 2007), SAGE (Gowda et al., 2004; White et al., 2006), MPSS (Nakano et al., 2006), cDNA-based microarrays and oligonucleotide arrays. An early example of a cDNA microarray project in rice research was the Rice Microarray Project (Yazaki et al., 2000). Later, the NSF Rice Oligonucleotide Array Project (www.ricearray.org) made available two microarray platforms—a 20K array and a 45K array mapped to the release 5 of the TIGR Rice Genome Annotation. Two expression profiling platforms that are widely used at present are the Affymetrix GeneChip® Rice Genome Array and Barley Genome Array. The rice array contains probes for 51 279 transcripts representing two rice cultivars (approximately 48 564 japonica transcripts and 1260 indica transcripts; www.affymetrix.com/products/arrays/specific/rice.affx). The barley array contains 22 792 probes for 21 439 non-redundant genes (Close et al., 2004).
A number of databases and tools exist for rice and barley gene expression profiling. For example, the Rice Expression Database (RED; Yazaki et al., 2002) contains data from more than 200 experiments in 24 different physiological categories. The Rice Oligonucleotide Array (ROMA) Expression Database (www.ricearray.org/rice_study.shtml) is designed to support expression data from a variety of platforms. Currently, seven studies including a total of 166 hybridizations are available on this website. PLEXdb (Wise et al., 2007) is a community resource for plant and plant pathogen microarrays and contains microarray expression data from several types of platforms, including the Affymetrix Rice Genome Array (two studies) and Barley Genome Array (16 studies). Finally, the Yale Virtual Center for Cellular Expression Profiling of Rice (http://bioinformatics.med.yale.edu/riceatlas/) provides transcriptional profiles from various rice cell types isolated by laser microdissection.
Methods for the analysis of single experiments are well established and allow identification of genes, pathways, or networks that are significantly affected in the tested conditions. In contrast, the simultaneous analysis of multiple microarray experiments (meta-analysis) is more challenging because the data may originate from different laboratories using different platforms and protocols, and because the volume of data that is being processed can be large and therefore requires specialized algorithms and data management structures. Genevestigator® is an advanced web-based system that was designed to perform molecular expression meta-analysis using novel concepts of data mining and innovative algorithms (Hruz et al., 2008). Meta-analysis in Genevestigator is based on the large-scale and systematic combination of normalized and quality-controlled expression data with experimental context variables using ontologies (e.g. anatomy, development, perturbation, or genetic background). This large-scale combination of data and meta-data produces novel insight into the spatio-temporal-response architecture of transcriptomes and allows users to answer questions that cannot be addressed by analyzing a single experiment (e.g. querying which conditions affect a given gene of interest). The integration of microarray data from different species and from hundreds of experiments therefore allows users to investigate the function of genes of interest, as well as to identify candidate genes for reverse genetics and biotechnological applications.
A systematic effort to collect, quality-control, and annotate rice and barley microarray data on a large scale is expected to yield a unique expression compendium for meta-analysis and systems biology research in monocotyledonous plants. Here, we present the extension of Genevestigator and its meta-analysis platform with Affymetrix GeneChip® microarray data to rice and barley.
| CONSTRUCTION AND CONTENT OF RICE AND BARLEY DATABASES |
|---|
|
|
|---|
Although both rice and barley Affymetrix arrays have been available for several years now, comparatively few experiments have been made available through public repositories as compared to data from the Arabidopsis Affymetrix ATH1 array. Table 1 provides a list of currently published studies that used rice or barley Affymetrix arrays. These experiments cover conditions such as salt, drought, and nutrient stress, response to various pathogens, hormone treatments, genotype, or mutant profiling, as well as spatial and temporal analysis of gene expression. Although not all of these data are currently available in public repositories, close to 1000 Affymetrix arrays from these two model species were normalized, quality-controlled, annotated, and are made available in Genevestigator (for more details about the curation process, please refer to Hruz et al., 2008 and previous publications). Because the value of a meta-analysis in Genevestigator increases as more data become available (Zimmermann et al., 2004), a major curation effort is underway to extend the databases and keep them up to date with new publicly released experiments. Researchers are also encouraged to submit their experiments so that they can visualize their own data in the context of all other public experiments.
|
In addition to compiling transcriptome data, an essential component of the Genevestigator database is the systematic annotation of experiments and sample properties. To achieve this, ontologies have to be developed for every new organism added to the database. For rice and barley, a new anatomy tree was designed using terms accepted in the community. The structure of the tree was constructed with a strong focus on data analysis, namely it must be non-redundant and intuitive for the user (see Figure 1). Therefore, the anatomy tree is a compromise between a linear list of anatomy parts and a semantically correct but redundant classification system. Additionally, because Genevestigator incorporates data from multiple organisms, the ontologies were designed to be orthologous between related species to facilitate cross-species analyses.
|
The ontologies for plant development were partitioned into approximately 10 stages of the lifecycle of an organism and are strictly time-related (in contrast to the frequent anatomy-related use of the term development). Currently, in Genevestigator, there are no refined developmental ontologies for seed development or floral development, for example. However, it is conceivable that with the increasing volume and diversity of data becoming available, the addition of such ontologies will allow a more detailed view of developmental processes.
Genevestigator distinguishes between external and internal perturbations (currently named stimulus and mutation, respectively). The stimulus ontology contains categories of pertubation such as biotic stress, abiotic stress, chemical, hormone, or light treatment. The mutation ontology groups mutants according to the genetic modification method (overexpression, T-DNA insertion, EMS mutagenesis, activation tagging, etc.). Often, in the publications describing the experiments, the information about mutants is sparse and has to be retrieved by Genevestigator curators from the literature, websites, or from the authors directly. In Genevestigator, such information is available in mouse-over information tooltips.
An important aspect of building data compendia and meta-analysis tools is the quality of the data. Genevestigator curators control the quality of every experiment using a pipeline of Bioconductor packages performing normalization and probe-level analysis. For example, arrays are flagged as of low quality if they fall out of range relative to the other arrays from the same experiment, if they exhibit higher RNA degradation, if they are particularly noisy, or if they do not correlate with replicate samples. More detailed information about the quality assessment is described in the User Manual available on the website (www.genevestigator.ethz.ch).
Genevestigator contains a number of toolsets, each containing several analysis tools. Toolsets group tools that focus on the same type of analytical approach, namely meta-analysis, biomarker search, clustering, or pathway analysis. More details about each of the tools and their applications for gene function analysis and gene discovery, as well as illustrative case studies and exercises, can be found in the original article (Hruz et al., 2008) and on the Genevestigator website (www.genevestigator.ethz.ch).
| DISCUSSION AND CONCLUSION |
|---|
|
|
|---|
The meta-analytical approach of Genevestigator consists of contextualizing the expression of genes based on the collection of available experimental conditions. Linking expression to phenotypes, mutations, or perturbations helps to interpret correlations between genes and factors, and ultimately to model gene function and gene regulatory networks. An extension of this approach is to explore the expression space for genes that show similar patterns in chosen sets of conditions (clustering and bi-clustering), to find genes that are correlated to a target gene, or to identify the best candidate genes that fulfill selected criteria (biomarker search). An example for this is the search for genes specifically expressed in selected tissues from Arabidopsis. A recent comparison with a proteome map from Arabidopsis tissues revealed that marker genes identified through Genevestigator were also highly specifically expressed in the proteome datasets (Baerenfaller et al., 2008). An earlier analysis by Becerra et al. (2006) used Genevestigator to confirm the specificity of expression of genes in seeds that were identified using an EST virtual subtraction method (Becerra et al., 2006).
The integration of plant transcriptome and metabolome data is expected to facilitate the elucidation of gene function (Kopka et al., 2004; Saito et al., 2008; Zimmermann et al., 2005). Several recent studies comparing transcript and metabolite abundances have shown that these two data types often, but not always, do correlate, and that the combined analysis reveals novel mechanisms of expression and flux regulation (e.g. Gibon et al., 2006).
Within the field of plant sciences, Genevestigator has so far been primarily used by scientists working with Arabidopsis. There are several reports in which authors have translated or compared results from Arabidopsis genes with those of other species, such as rice (Yang et al., 2008), potato (Dvorakova et al., 2007), poplar (Couturier et al., 2007; Vigneault et al., 2007), or across a variety of plants including rice, barley, poplar, and green algae (Roberts and Hejgaard, 2008). However, transferring knowledge from Arabidopsis into other species, particularly monocotyledonous crop species, is challenging. In fact, although Arabidopsis and cereals share a substantial number of orthologous genes, the pathways and underlying regulatory networks are likely to possess individual and possibly non-orthologous characteristics. The availability of rice and barley microarray data in Genevestigator will facilitate the extrapolation of results to other cereals due to higher colinearity and synteny of their genomes within these crop species. In a further step, combining evidence across several species is expected to increase the confidence about conclusions drawn from a single species. An interesting development in this respect has been the mapping of probesets from different plant Affymetrix arrays into groups of orthologous sequences (Frickey et al., 2008). A web-based tool (AffyTrees) is available for plant scientists to assign probesets on different Affymetrix arrays that measure the expression of orthologous genes. In Genevestigator, these probesets can be stored into distinct selections, which, with the help of the FOCUS function, can be rapidly switched from one species to another. Thus, users can rapidly gain cross-species insight into the regulation of these target genes.
To maximize data comparability between experiments and therefore ensure best possible results from a meta-analysis in Genevestigator, we limit the combination of data to a single array type at a time. At the moment, only Affmetrix data are available in Genevestigator because Affymetrix has provided one of the most suitable platforms for data meta-analysis. First, it offers a streamlined protocol and hardware for expression profiling using GeneChip® technology, thereby minimizing handling biases. Second, it is the platform for which data in the public domain have been most abundant so far. This allowed us to create datasets of thousands of arrays from the same array type and technology and to avoid the intricacies of comparing data from different technologies and platforms. The quality of microarray data from several other platforms (e.g. from Illumina or Agilent arrays), however, has been shown to be similarly high (MAQC Project: Patterson et al., 2006; Shi et al., 2006). We therefore expect that, in the future, robust statistical methods will allow us to integrate data from a variety of different platforms.
In summary, the rice and barley microarray databases in Genevestigator provide the plant community with transcriptome meta-analysis capability for monocotyledonous species. All public rice and barley expression data within Genevestigator are made freely available to academic users via the CLASSIC version of Genevestigator. The interface is a JAVA applet running in the user's browser. User support information is provided on the Genevestigator website (www.genevestigator.ethz.ch).
| SUPPLEMENTARY DATA |
|---|
|
|
|---|
Supplementary Data are available at Molecular Plant Online.
| FUNDING |
|---|
|
|
|---|
No conflict of interest declared.
-
Baerenfaller K, et al. Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science (2008) 320:938–941.
Becerra C, Puigdomenech P, Vicient CM. Computational and experimental analysis identifies Arabidopsis genes specifically expressed during early seed development. BMC Genomics (2006) 7:38.[CrossRef][Medline]
Bruskiewich R, Metz T, McLaren G. Bioinformatics and crop information systems in rice research. International Rice Research Notes (2006) 31:5–12.
Christou P. Rice transformation: bombardment. Plant Mol. Biol. (1997) 35:197–203.[CrossRef][Web of Science][Medline]
Close TJ, et al. A new resource for cereal genomics: 22K barley GeneChip comes of age. Plant Physiol. (2004) 134:960–968.
Couturier J, Montanini B, Martin F, Brun A, Blaudez D, Chalot M. The expanded family of ammonium transporters in the perennial poplar plant. New Phytol. (2007) 174:137–150.[CrossRef][Web of Science][Medline]
Dvorakova L, Cvrckova F, Fischer L. Analysis of the hybrid proline-rich protein families from seven plant species suggests rapid diversification of their sequences and expression patterns. BMC Genomics (2007) 8:412.[CrossRef][Medline]
Frickey T, Benedito VA, Udvardi M, Weiller G. AffyTrees: facilitating comparative analysis of Affymetrix plant microarray chips. Plant Physiol. (2008) 146:377–386.
Gibon Y, et al. Integration of metabolite with transcript and enzyme activity profiling during diurnal cycles in Arabidopsis rosettes. Genome Biol. (2006) 7:R76.[CrossRef][Medline]
Goff SA, et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science (2002) 296:92–100.
Gowda M, Jantasuriyarat C, Dean RA, Wang GL. Robust–LongSAGE (RL–SAGE): a substantially improved LongSAGE method for gene discovery and transcriptome analysis. Plant Physiol. (2004) 134:890–897.
Grennan AK. Genevestigator: facilitating web-based gene-expression analysis. Plant Physiol. (2006) 141:1164–1166.
Hruz T, et al. Genevestigator V3: a reference expression database for the meta-analysis of transcriptomes. Adv. Bioinformatics. (2008) 2008:420747.
IRGSP. The map-based sequence of the rice genome. Nature (2005) 436:793–800.[CrossRef][Web of Science][Medline]
Izawa T, Shimamoto K. Becoming a model plant: the importance of rice to plant science. Trends Plant Sci. (1996) 1:95–99.[CrossRef][Web of Science]
Kopka J, Fernie A, Weckwerth W, Gibon Y, Stitt M. Metabolite profiling in plant biology: platforms and destinations. Genome Biol. (2004) 5:109.[CrossRef][Medline]
Leymarie J, Bruneaux E, Gibot-Leclerc S, Corbineau F. Identification of transcripts potentially involved in barley seed germination and dormancy using cDNA–AFLP. J. Exp. Bot. (2007) 58:425–437.
Meinke DW, Cherry JM, Dean C, Rounsley SD, Koornneef M. Arabidopsis thaliana: a model plant for genome analysis. Science (1998) 282(662):679–682.
Murray F, Brettell R, Matthews P, Bishop D, Jacobsen J. Comparison of Agrobacterium-mediated transformation of four barley cultivars using the GFP and GUS reporter genes. Plant Cell Rep. (2004) 22:397–402.[CrossRef][Web of Science][Medline]
Nakano M, Nobuta K, Vemaraju K, Tej SS, Skogen JW, Meyers BC. Plant MPSS databases: signature-based transcriptional resources for analyses of mRNA and small RNA. Nucleic Acids Res. (2006) 34:D731–D735.
Nishimura A, Aichi I, Matsuoka M. A protocol for Agrobacterium-mediated transformation in rice. Nat. Protoc. (2006) 1:2796–2802.[CrossRef][Medline]
Obert B, Middlefell-Williams J, Millam S. Genetic transformation of barley microspores using anther bombardment. Biotechnol. Lett. (2008) 30:945–949.[CrossRef][Web of Science][Medline]
Patterson TA, et al. Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat. Biotechnol. (2006) 24:1140–1150.[CrossRef][Web of Science][Medline]
Pflieger S, Lefebvre V, Causse M. The candidate gene approach in plant genetics: a review. Mol. Breeding. (2001) 7:275–291.[CrossRef]
Rensink WA, Buell CR. Arabidopsis to rice: applying knowledge from a weed to enhance our understanding of a crop species. Plant Physiol. (2004) 135:622–629.
Roberts TH, Hejgaard J. Serpins in plants and green algae. Funct. Integr. Genomics. (2008) 8:1–27.[CrossRef][Web of Science][Medline]
Saito K, Hirai MY, Yonekura-Sakakibara K. Decoding genes with coexpression networks and metabolomics: majority report by precogs. Trends Plant Sci. (2008) 13:36–43.[CrossRef][Web of Science][Medline]
Salentijn EMJ, et al. Plant translational genomics: from model species to crops. Mol. Breeding. (2007) 20:1–13.[CrossRef]
Shi L, et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. (2006) 24:1151–1161.[CrossRef][Web of Science][Medline]
Sreenivasulu N, Graner A, Wobus U. Barley genomics: an overview. Int. J. Plant Genomics (2008) 2008:486258.[Medline]
Vigneault F, Lachance D, Cloutier M, Pelletier G, Levasseur C, Seguin A. Members of the plant NIMA-related kinases are involved in organ development and vascularization in poplar, Arabidopsis and rice. Plant J (2007) 51:575–588.[CrossRef][Web of Science][Medline]
von Zychlinski A, Baginsky S, Gruissem W. Rice: an emerging model for plant systems biology. Rice Genetics V, Brar DS, Mackill DJ, Hardy B, eds. (2007).
White J, et al. Abundant transcripts of malting barley identified by serial analysis of gene expression (SAGE). Plant Biotechnol. J. (2006) 4:289–301.[CrossRef][Medline]
Wise RP, Caldo RA, Hong L, Shen L, Cannon E, Dickerson JA. BarleyBase/PLEXdb: a unified expression profiling database for plants and plant pathogens. Methods Mol. Biol. (2007) 406:347–364.[Medline]
Yang Z, Wang X, Gu S, Hu Z, Xu H, Xu C. Comparative study of SBP-box gene family in Arabidopsis and rice. Gene (2008) 407:1–11.[CrossRef][Web of Science][Medline]
Yazaki J, et al. Embarking on rice functional genomics via cDNA microarray: use of 3 UTR probes for specific gene expression analysis. DNA Res. (2000) 7:367–370.[CrossRef][Web of Science][Medline]
Yazaki J, Kishimoto N, Ishikawa M, Kikuchi S. Rice Expression Database: the gateway to rice functional genomics. Trends Plant Sci. (2002) 7:563–564.[CrossRef][Web of Science]
Yu J, et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science (2002) 296:79–92.
Zhang JZ, Creelman RA, Zhu JK. From laboratory to field: using information from Arabidopsis to engineer salt, cold, and drought tolerance in crops. Plant Physiol. (2004) 135:615–621.
Zimmermann P, Hennig L, Gruissem W. Gene-expression analysis and network discovery using Genevestigator. Trends Plant Sci. (2005) 10:407–409.[CrossRef][Web of Science][Medline]
Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W. GENEVESTIGATOR: Arabidopsis microarray database and analysis toolbox. Plant Physiol. (2004) 136:2621–2632.
This article has been cited by other articles:
![]() |
S. Vyroubalova, K. Vaclavikova, V. Tureckova, O. Novak, M. Smehilova, T. Hluska, L. Ohnoutkova, I. Frebort, and P. Galuszka Characterization of New Maize Genes Putatively Involved in Cytokinin Metabolism and Their Expression during Osmotic Stress in Relation to Cytokinin Levels Plant Physiology, September 1, 2009; 151(1): 433 - 447. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

