You are seeing this message because your Web browser does not support basic Web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.


ABOUT ARCHIVES
Advanced Search

Welcome   | My Account | E-mail Alerts | Access Rights | Sign In


  Vol. 60 No. 5, May 2003 TABLE OF CONTENTS
  Archives
  •  Online Features
  Basic Science Seminars in Neurology
 This Article
 •PDF
 • Reply to article
 •Send to a friend
 • Save in My Folder
 •Save to citation manager
 •Permissions
 Citing Articles
 •Citation map
 •Citing articles on ISI (6)
 •Contact me when this article is cited
 Related Content
 •Similar articles in this journal
 Topic Collections
 •Neurogenetics
 •Genetic Counseling/ Testing/ Therapy
 •Alert me on articles by topic
 Social Bookmarking
  Add to CiteULike Add to Connotea Add to Del.icio.us Add to Digg Add to Reddit Add to Technorati
What's this?

Application of Microarrays to Neurological Disease

Lisa-Marie Sturla, PhD; Ana Fernandez-Teijeiro, MD, PhD; Scott L. Pomeroy, MD, PhD

Arch Neurol. 2003;60:676-682.

INTRODUCTION

Modern microarray-based functional genomics holds great promise for revealing novel molecular and cellular mechanisms of disease. First introduced commercially in 1996, microarrays have been used widely to monitor the expression of thousands of genes in biological samples, as described in the following paragraphs. Other microarray-based genomic applications are also in development, including comparative genomic hybridization, on-chip sequencing, and novel drug discovery. For example, DNA array-based comparative genomic hybridization identifies chromosomal gains and losses with greatly improved resolution compared with conventional methods that use metaphase chromosomes as hybridization targets.1 This increase in resolution will continue to improve as the technology advances. Moreover, microarrays provide a better platform for automation than is possible with standard metaphase techniques. Where genetic mutations and aberrations are already well characterized, microarrays can be customized to be effectively used as a diagnostic and prognostic tool.2-3 In the field of drug discovery, microarrays have the potential to dramatically enhance progress, being used at all stages from target discovery (through validation of new molecular targets and understanding modes of action) to predicting patient response.4

These devices are beginning to revolutionize how scientists explore the operation of normal cells in the body and the molecular aberrations that underlie medical disorders. DNA microarrays, which are based on well-established principles of nucleic acid hybridization, simultaneously interrogate thousands of genes.5-7 The actual mechanics of data capture from raw material are ever-improving and well documented, and it is the analysis and discovery of meaningful gene expression patterns within these data to which we now must turn our attention.

Analytical approaches to gene expression analysis using a cancer classification model are illustrated in the recent article by Pomeroy et al.8 Several important clinical questions were answered via the application of microarray technology and emerging data analysis techniques to pediatric brain tumors.8 Using microarrays that monitor the expression of more than 6800 genes, we endeavored to definitively differentiate a group of embryonal tumors whose diagnosis on the basis of morphologic features remains controversial and to predict outcome in the most common of these tumors, medulloblastoma, for which patient response to treatment is unpredictable.

There are 2 general approaches to data analysis: supervised and unsupervised. Unsupervised methods are applied to the entire gene expression data set without any previous knowledge of sample classification, allowing an impartial assessment of the underlying features within a data set. Two examples of unsupervised methods are principal component analysis and self-organizing maps (SOMs). Principal component analysis allowed us to differentiate at a molecular level between the different brain tumor types and normal cerebellum (Figure 1). The marker genes responsible for this distinction supported the conclusion that medulloblastomas are derived from cerebellar granule cell precursors and that they are molecularly distinct from supratentorial primitive neuroectodermal tumors. This argues against the hypothesis that medulloblastomas are a subset of primitive neuroectodermal tumors, differing only in their location in the cerebellum. Self-organizing maps are ideally suited for exploratory data analysis in the generally large and complex data sets generated in the study of a particular disease, in our case brain tumors. Using SOMs, we identified 2 distinct biological subtypes of medulloblastomas with low and high ribosomal protein expression (Figure 2). Electron microscopy subsequently confirmed that these differences in ribosomal gene expression were reflected at a cellular level by differences in ribosome biogenesis. Although this was not an expected result, it provided us with an interesting therapeutic target. Sirolimus and its analogues are currently under clinical investigation in tumors reliant on the PI3K signaling pathway and ribosome biogenesis.9



View larger version (26K):
[in this window]
[in a new window]
Figure 1. Principal component analysis, with axes representing the 3 principal components (Comp) (linear combinations of genes) accounting for most of the data variance, using all genes exhibiting variation across the data set (A) and using the top 10 genes most highly associated with each tumor class (B). CNS indicates central nervous system; PNETs, primitive neuroectodermal tumors.




View larger version (272K):
[in this window]
[in a new window]
Figure 2. Self-organizing maps were used to discover 2 predominant classes of medulloblastoma: class 0 (high ribosome content) and class 1 (low ribosome content). The top 50 genes for each class are shown. Each column represents an individual sample and each row represents a single gene. Relative gene expression is depicted by red when high and blue when low. mRNA indicates messenger RNA.


This approach, although useful in its ability to pull out prominent structure (eg, medulloblastoma vs primitive neuroectodermal tumors) in a data set, may miss more subtle distinctions. We found this to be true for outcome prediction. Neither principal component analysis nor SOMs identified prognostically significant subgroups of medulloblastomas, so we turned to supervised analysis. Expression profiles were obtained from 60 children with medulloblastomas who received similar treatment and whose outcome was known. Supervised methods were used to "learn" the distinction between survivors and patients who failed treatment (Figure 3). Using take-one-out cross-validation, gene expression patterns predict survival with substantially more accuracy than current clinical risk criteria. Several supervised analysis methods showed a similar degree of accuracy, including k-nearest neighbor, support vector machines, and structural pattern localization analysis by sequential histograms.



View larger version (365K):
[in this window]
[in a new window]
Figure 3. Top 50 genes associated with survival (A) and treatment failure (B). Each column represents an individual sample and each row represents a single gene. Relative gene expression is depicted by red when high and blue when low. mRNA indicates messenger RNA.


Supervised methods were also used to successfully classify classic and desmoplastic medulloblastomas (histologically confirmed by a single neuropathologist). These algorithms allowed us not only to classify tumors and predict outcome but also to discover previously unknown relationships between coordinate gene expression and tumor characteristics. For example, we demonstrated that the genes encoding sonic hedgehog (shh)–related proteins are highly expressed in desmoplastic medulloblastomas, suggesting that they arise as a consequence of dysregulated shh signaling. Thus, microarray analysis can identify gene expression profiles that signify an activated regulatory pathway or interacting molecular processes leading to a known cellular response.

There are, of course, limitations to any approach that involves the generation of such a large amount of data for each of a relatively small group of samples. One of the most significant risks is finding statistically significant associations by chance. Consequently, identification of gene expression patterns that may underlie the pathogenesis of brain tumors requires validation. Validation of the expression of single genes can be done using well-established techniques such as Northern or Western blotting, as well as immunohistochemistry or in situ hybridization. Hypotheses that arise from the interpretation of significant patterns of gene expression can be tested in a variety of ways. For example, we used electron microscopy to demonstrate that tumors with increased coordinate expression of ribosomal proteins have high numbers of free ribosomes. Our gene expression–based outcome predictions must be validated in an independent, prospective cohort of patients before gene expression profiling can be used for risk stratification in future clinical trials. It is evident, then, that the hypotheses generated from the analysis of complex gene expression patterns must be tested by independent measures before final conclusions can be reached.


METHODS
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Comment
 •Author information
 •References

UNSUPERVISED

Principal Component Analysis

A simple multidimensional scaling of the data set was obtained by plotting the top principal components (combinations of genes) that account for a significant fraction of the variance in scatterplots. To study the natural clustering of the brain tumor samples, we initially considered the subset of genes with the highest variation across samples (Figure 1A). In this case, the top 3 principal components account for approximately 43% of the variance of the marker genes. We then plotted principal components based on the top 10 marker genes associated with each tumor, selected by the signal-to-noise statistic.10 The top 3 principal components of this data set accounted for approximately 61% of the variance, and the degree of separation and clustering of tumor types was significantly improved over that obtained by the analysis of genes with highest variation (Figure 1B). These calculations were performed using Mathsoft software available on the Internet at http://www.mathsoft.com.

Self-organizing Maps

We performed SOMs using the GeneCluster software package available on the Internet at http://www.genome.wi.mit.edu. As an exploratory data analysis method, SOMs identify groups of samples with common gene expression patterns within a large heterogeneous sample set. To calculate SOMs, one initially randomly selects and maps a grid of nodes onto the tumor sample set. Through an iterative series of calculations testing the similarity of gene expression between samples, the geometry of the nodes is adjusted to reflect the data structure. If the number of nodes exceeds the number of "natural" clusters in the sample set, then the nodes will converge to reflect the natural clustering. In our case, applying this unsupervised approach to the medulloblastoma data set led to the discovery that 2 is the optimum number of groups identifiable by SOMs (Figure 2).

SUPERVISED

To build supervised classifiers, we defined target classes based on morphologic features, tumor class, or treatment outcome. The method is illustrated by our analysis of treatment outcome (Figure 3). In this case, we created 2 classes of patients based on clinical outcome. Gene expression profiles from patients who died of progressive disease due to treatment failure were compared with expression profiles of patients who were still alive at the end of the study and who had been followed for at least 1 year after cessation of therapy. Genes correlated with the 2 outcome classes were identified by sorting all of the genes on the array according to the signal-to-noise statistic.10 We built a sample classifier in cross-validation by removing 1 sample and then using the rest as a training set and then repeating this procedure until all samples were tested as "unknowns." Several models were built using different numbers of marker genes, and the final chosen model was the one that minimized the total error (number of samples that were misclassified) in cross-validation. For this, k-nearest neighbor, weighted voting, and support vector machine algorithms were used (Figure 4).



View larger version (25K):
[in this window]
[in a new window]
Figure 4. Pictorial representation of supervised analysis methods for sample classification. A hypothetical distribution of samples in multidimensional gene expression space is shown in 2 dimensions. The samples are identified as class A (red) or class B (green) to demonstrate how the following algorithms assign by binary decision an unclassified sample (blue). For outcome predictions, the 2 classes correspond to patients who died of tumor progression due to treatment failure vs survivors after therapy. A, k-Nearest neighbor assigns the unknown (test) sample to a class based on its proximity in gene expression space to the surrounding samples (neighbors). In this case, the unknown sample has a gene expression profile closer to that of the samples in class A. B, Weighted voting determines a decision boundary (DB) midway between the mean gene expression levels of 2 groups of samples. The closer each gene of the unknown sample is to the DB for a particular gene, the less weight that gene carries toward the assignment of the sample to an outcome class. The unknown sample is assigned to the class with the most positive voting genes, class B in this example. C, Support vector machine algorithms determine a DB based on the optimum separation of the 2 classes according to their gene expression profiles. It assigns a classification to the unknown sample according to its position in gene expression space in relation to that of the DB. In this case, a hypothetical DB is shown assigning the unknown sample to class A.


k-Nearest Neighbors

The k-nearest neighbor algorithm11 was used to predict the class of the unknown sample by calculating the distance of that sample from those surrounding it in gene expression space (class-specific marker genes, ie, those associated with survival or treatment failure). The unknown sample was predicted to be in one or the other outcome class based on the similarity of gene expression with that of most of the k-nearest (neighbor) samples (Figure 4A). Marker genes were chosen from those highly correlating with the predetermined classes using the signal-to-noise statistic.

Weighted Voting

The weighted voting algorithm10 makes a weighted linear combination of relevant "marker" or "informative" genes obtained in the training set to provide a classification scheme for new samples. The selection of marker genes for each outcome class was determined by computing the signal-to-noise statistic of each gene for the predefined classes. The algorithm determined the decision boundary (halfway) between the outcome class means for each gene (Figure 4B). To predict the class of the unknown sample, each gene in the sample expression profile casts a vote, and the unknown sample is assigned to the class with the most positive voting genes. The distance of that sample from the decision boundary determines the weight that each gene carries in this voting process. The closer each gene of the unknown sample is to the decision boundary for a particular gene, the less weight that gene carries toward the assignment of the sample to an outcome class. Confidence in the class prediction of the unknown sample was determined by the size of the voting margin responsible for putting the sample in one class vs the other.

Support Vector Machines

The basic idea behind support vector machines is to construct an optimal class-separating hyperplane (decision boundary) by mapping the gene expression data to a high-dimensional space.12-13 Linear separation in this higher dimensional space corresponds to a nonlinear decision boundary separating the 2 outcome classes (Figure 4C). This allowed us to more optimally separate outcome classes than with weighted voting, where the decision boundary is linear. As with weighted voting, samples are assigned to a class by their position in relation to the decision boundary, and, again, confidence of classification is dependent on the relative distance of the sample from that boundary into a particular class.


COMMENT
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Comment
 •Author information
 •References

RELEVANCE TO THE STUDY OF NEUROSCIENCE AND THE PRACTICE OF NEUROLOGY

The use of microarrays provides a springboard from which we can start to examine the cellular pathogenesis underlying neurological disease and perhaps narrow down the search to a manageable group of therapeutic targets. Examples of this can be seen in neurological disorders such as multiple sclerosis, Huntington disease, Parkinson disease, and Alzheimer disease.14-17 Applying microarray technology to a transgenic animal model of Huntington disease resulted in the finding that genes encoding certain neurotransmitters, calcium and retinoid signaling pathway components, were down-regulated, whereas those encoding inflammatory components were up-regulated.16 These findings were unexpected consequences of the mutant huntington protein and could only have been discovered on this scale by microarray analysis.

Multiple sclerosis is a complex disorder with multiple clinical subtypes that cannot be diagnosed by clinical criteria at initial presentation.18 Immunomodulatory treatment has proved to be relatively successful in relapsing-remitting disease, but it is not as useful in primary or secondary progressive disease.18-19 To date, there are no in vivo markers that allow specific direction of treatment. Microarray analysis has begun to dissect the molecular heterogeneity of multiple sclerosis, identifying genes related to cell metabolism, structure, cytokines, and cell adhesion molecules. In addition, a gene not previously associated with multiple sclerosis, encoding the Duffy chemokine receptor, was identified using this technology.14, 17 Although these results are preliminary, eventually microarray expression analysis may lead to the identification of markers that are detectable in living patients, allowing prognosis to be accurately predicted at the time of initial diagnosis and treatment to be tailored accordingly. An even greater future challenge is offered by the investigation of psychiatric disorders, which seem to result from the interplay of polygenic and epigenetic factors on multiple brain circuits.20

CONCLUSIONS

The development of array-based DNA mutation screening may, in the future, prove to be beneficial for the identification of an individual's genetic propensity to acquire disorders such as Alzheimer or Parkinson disease. Consequently, at-risk candidates can be selected for close monitoring, intensive preventive care, and early clinical intervention. Other applications may include screening of patients for gene variants that affect the individual's response to certain medications, allowing the physician to tailor the best treatment regimen for a given disease in an individual patient.

USEFUL WEB SITES

The following Web sites are useful in the study of microarray-based functional genomics:


AUTHOR INFORMATION
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Comment
 •Author information
 •References

Corresponding author and reprints: Scott L. Pomeroy, MD, PhD, Division of Neuroscience, Department of Neurology, Children's Hospital, 300 Longwood Ave, Boston, MA 02115 (e-mail: scott.pomeroy{at}tch.harvard.edu).

Accepted for publication December 11, 2002.

Author contributions: Study concept and design (Dr Pomeroy); acquisition of data (Drs Sturla, Fernandez-Teijeiro, and Pomeroy); analysis and interpretation of data (Drs Sturla and Pomeroy); drafting of the manuscript (Drs Sturla, Fernandez-Teijeiro, and Pomeroy); critical revision of the manuscript for important intellectual content (Dr Pomeroy); statistical expertise (Dr Pomeroy); obtained funding (Dr Pomeroy); administrative, technical, and material support (Dr Sturla); study supervision (Dr Pomeroy).

From the Division of Neuroscience, Department of Neurology, Children's Hospital, Harvard Medical School, Boston, Mass (Drs Sturla, Fernandez-Teijeiro, and Pomeroy); and the Unidad de Oncologia Pediatrica, Hospital de Cruces-Baracaldo, Basque Country, Spain (Dr Fernandez-Teijeiro).


REFERENCES
 Jump to Section
 •Top
 •Introduction
 •Methods
 •Comment
 •Author information
 •References

1. Theillet C, Orsetti B, Redon R, Manoir SD. Genomic profiling: from molecular genetics to DNA arrays. Bull Cancer. 2001;88:261-268. ISI | PUBMED
2. Jain AN, Chin K, Borresen-Dale AL, et al. Quantitative analysis of chromosomal CGH in human breast tumors associates copy number abnormalities with p53 status and patient survival. Proc Natl Acad Sci U S A. 2001;98:7952-7957. FREE FULL TEXT
3. Hui AB, Lo KW, Yin XL, Poon WS, Ng HK. Detection of multiple gene amplifications in glioblastoma multiforme using array-based comparative genomic hybridisation. Lab Invest. 2001;81:717-723. ISI | PUBMED
4. Clarke PA, Poele RT, Wooster R, Workman P. Gene expression and microarray analysis in cancer biology, pharmacology, and drug development: progress and potential. Biochem Pharmacol. 2001;62:1311-1336. FULL TEXT | ISI | PUBMED
5. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467-470. FREE FULL TEXT
6. DeRisi J, Penland L, Brown PO, et al. Use of cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet. 1996;14:457-460. FULL TEXT | ISI | PUBMED
7. Lockhart DJ, Winzeler EA. Genomics, gene expression and DNA arrays. Nature. 2000;405:827-836. FULL TEXT | PUBMED
8. Pomeroy SL, Tamayo P, Gaasenbeek M, et al. Prediction of central nervous system embryonal tumor outcome based on gene expression. Nature. 2001;415:436-442.
9. Hidalgo M, Rowinsky EK. The rapamycin-sensitive signal transduction pathway as a target for cancer therapy. Oncogene. 2000;19:6680-6686. FULL TEXT | ISI | PUBMED
10. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863-14868. FREE FULL TEXT
11. Dasarathy VB. Nearest Neighbour (NN) Norms: NN Pattern Classification Techniques. Los Alamitos, Calif: IEEE Computer Society Press; 1991.
12. Mukherjee S, Tamayo P, Mesirov JP, Slonim D, Verri A, Poggio T. Support Vector Machine Classification of Microarray Data, CBCL Paper #182/AI Memo #1676. Cambridge: Massachusetts Institute of Technology; 1999.
13. Brown MP, Grundy WN, Lin D, et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A. 2000;97:262-267. FREE FULL TEXT
14. Whitney LW, Becker KG, Tresser NJ, et al. Analysis of gene expression in mutiple sclerosis lesions using cDNA microarrays. Ann Neurol. 1999;46:425-428. FULL TEXT | ISI | PUBMED
15. Ginsberg SD, Hemby SE, Lee VM, Eberwine JH, Trojanowski JQ. Expression profile of transcripts in Alzheimer's disease tangle-bearing CA1 neurons. Ann Neurol. 2000;48:77-87. FULL TEXT | ISI | PUBMED
16. Luthi-Carter R, Strand A, Peters NL, et al. Decreased expression of striatal signaling genes in a mouse model of Huntington's disease. Hum Mol Genet. 2000;9:1259-1271. FREE FULL TEXT
17. Steinman L. Gene microarrays and experimental demyelinating disease: a tool to enhance serendipity. Brain. 2001;124:1897-1899. FREE FULL TEXT
18. Bitsch A, Bruck W. Differentiation of multiple sclerosis subtypes: implications for treatment. CNS Drugs. 2002;16:405-418. FULL TEXT | ISI | PUBMED
19. Goodin DS, Frohman EM, Garmany GP, et al. Disease modifying therapies in multiple sclerosis: report of the Therapeutics and Technology Assessment Subcommittee of the American Academy of Neurology and the MS Council for Clinical Practice Guidelines. Neurology. 2002;58:169-178. FREE FULL TEXT
20. Mirnics K, Middleton FA, Lewis DA, Levitt P. Analysis of complex brain disorders with gene expression microarrays: schizophrenia as a disease of the synapse. Trends Neurosci. 2001;24:479-486. FULL TEXT | ISI | PUBMED

SECTION EDITOR: HASSAN M. FATHALLAH-SHAYKH, MD



Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati     What's this?





HOME | CURRENT ISSUE | PAST ISSUES | TOPIC COLLECTIONS | CME | SUBMIT | SUBSCRIBE | HELP
CONDITIONS OF USE | PRIVACY POLICY | CONTACT US | SITE MAP
 
© 2003 American Medical Association. All Rights Reserved.