You are seeing this message because your Web browser does not support basic Web standards. Find out more about why this message is appearing and what you can do to make your experience on this site better.


ABOUT ARCHIVES
Advanced Search

Welcome   | My Account | E-mail Alerts | Access Rights | Sign In


  Vol. 62 No. 11, November 2005 TABLE OF CONTENTS
  Archives
  •  Online Features
  Neurological Review
 This Article
 •Abstract
 •PDF
 • Reply to article
 •Send to a friend
 • Save in My Folder
 •Save to citation manager
 •Permissions
 Citing Articles
 •Citation map
 •Citing articles on HighWire
 •Citing articles on ISI (8)
 •Contact me when this article is cited
 Related Content
 •Similar articles in this journal
 Topic Collections
 •Neurogenetics
 •Statistics and Research Methods
 •Genetic Counseling/ Testing/ Therapy
 •Alert me on articles by topic

Microarrays

Applications and Pitfalls

Hassan M. Fathallah-Shaykh, MD

Arch Neurol. 2005;62:1669-1672.

ABSTRACT

Microarrays are simple assays that measure the relative expression levels of tens of thousands of genes. Excitement about their importance and potential contributions to biology and medicine has been intense. Nonetheless, recent insights into the limitations and pitfalls of microarrays have led to caution about data interpretation. Microarrays are very useful but they are also very misleading; better data analysis tools are needed to improve accuracy.



INTRODUCTION
 Jump to Section
 •Top
 •Introduction
 •The experimental system
 •Potential applications of...
 •Pitfalls of microarrays
 •The nature of the...
 •Highly specific expression...
 •Significance and future...
 •Author information
 •References

Over the past half century, scientists have studied cause-and-effect relationships between known genes and biological phenotypes or human disease. Recent technological advances have changed the landscape of biomedical research. The complete genomes of several organisms are now available, and the expression of tens of thousands of genes may be assayed by microarrays. Genomes are rich sources of complex genetic information, most of which is unknown and unpredictable. Hence, the term discovery has been introduced to imply finding without preconceived bias which genes are relevant to a biological phenotype and how the genes interact.

In a single assay, microarrays generate tens of thousands of measurements of the relative levels of messenger RNA expression. When first developed, microarrays appeared to hold great promise for translating genomics into significant advances in basic biology and medicine. The National Institutes of Health (Bethesda, Md), universities, and drug companies have invested heavily in various applications of microarrays. Nonetheless, recent findings have uncovered major pitfalls that cast doubt on the interpretation of microarray data. Herein, I review the technology of complementary DNA (cDNA) microarrays, their applications and pitfalls, and future directions in data analysis.


THE EXPERIMENTAL SYSTEM
 Jump to Section
 •Top
 •Introduction
 •The experimental system
 •Potential applications of...
 •Pitfalls of microarrays
 •The nature of the...
 •Highly specific expression...
 •Significance and future...
 •Author information
 •References

Spotted arrays may include tens of thousands of cDNAs laid on glass slides. Each experiment uses 2 RNA samples and measures the relative expression level of the cDNAs in 1 messenger RNA as compared with the other (Figure 1). The messenger RNAs are reverse transcribed to cDNAs and labeled with fluorescent dyes, mixed, and hybridized to the glass slide. After washing, the spot-bound fluorescent dyes are excited by lasers of appropriate wavelengths to generate 2 "scanned" images, which correspond to the samples. Images are analyzed to quantify (1) the signal within each spot and (2) a small rim of background surrounding each spot. The principal measurement is the expression ratio of each spot:
{nnr40035e1}



View larger version (101K):
[in this window]
[in a new window]
Figure 1. A schematic portraying expression profiling of a sample vs a reference by spotted microarrays using probe-switching (dye swap) experiments. The results yield replicate expression levels of the ratios of the complementary DNAs (cDNAs) in the sample vs the reference. mRNA indicates messenger RNA.


A log2(ratio) greater than 0 implies up-regulation and a log2(ratio) less than 0 implies down-regulation. A data set of a single experiment contains tens of thousands of ratios.

Microarray experiments using spotted arrays are usually designed to compare each of several samples to a single reference RNA that is common to all experiments. The data are expressed in a matrix whose columns correspond to samples and rows to genes; each column represents a distinct experiment. Analytical strategies often apply multivariate statistics including clustering, the self-organizing maps of Kohonen (neural networks), and principal component analysis or multidimensional scaling.1-6


POTENTIAL APPLICATIONS OF MICROARRAYS IN BIOLOGY AND MEDICINE
 Jump to Section
 •Top
 •Introduction
 •The experimental system
 •Potential applications of...
 •Pitfalls of microarrays
 •The nature of the...
 •Highly specific expression...
 •Significance and future...
 •Author information
 •References

The enthusiasm about the potential of microarrays has been intense.7-9 Experimental designs are usually aimed at discovering (1) patterns of expression that classify disease phenotypes and predict clinical behavior or (2) molecular targets and systems that create the biology. The first goal is based on the intuitive idea that genome-scale molecular expression refines the pathological classification of disease. Specifically, classifications based on molecular expression are expected to be more accurate and sensitive than those based on microscopy. Preliminary proofs of principles include reports of patterns of genetic expression that predict new classifications of central nervous system embryonal tumors, gliomas, large B-cell lymphoma, and breast carcinoma.1, 10-15 For example, the molecular classes may either replicate the pathological distinction or divide the subjects within the same pathological class into subgroups that predict distinct clinical behaviors like long-term vs short-term survival times and drug response vs resistance.

The idea that the global transcriptional response constitutes molecular phenotypes has recently received attention.12, 16-17 In this model, phenotypes are created by molecular systems in which single genes or molecules belong to rich networks of dynamic molecular interactions that include transcriptional regulation, signaling pathways, protein-protein, and protein–nucleic acid interactions.16, 18 Examples of microarray applications in systems biology include the discovery of (1) the regulation of the transcriptional response when yeast cells encounter nutrients, (2) the yeast galactose-utilization pathway, and (3) the principles of balanced genetic expression and opposing molecular functions behind the phenotypes of meningiomas and cultured gliomas.16, 19-22 Theoretically, one could apply microarrays to discover new molecular classifications of neurological diseases, to study and define the molecular systems that create each individual phenotype, and to perturb the network to find the best targets that transition the whole system between phenotypes.


PITFALLS OF MICROARRAYS
 Jump to Section
 •Top
 •Introduction
 •The experimental system
 •Potential applications of...
 •Pitfalls of microarrays
 •The nature of the...
 •Highly specific expression...
 •Significance and future...
 •Author information
 •References

Following the initial hype and excitement about microarrays, their pitfalls and limitations are causing a hard reality check. Current methods for microarray expression data analysis require numerous samples and yield measurements of low specificity. Kothapalli et al23 examined microarray data from 2 different systems. They report inconsistencies in sequence fidelity of the spotted microarrays, variability of differential expression, low specificity of cDNA probes, discrepancy in fold-change calculations, and lack of probe specificity for different isoforms of a gene. Ntzani and Ioannidis24 examined 84 large-scale microarray expression data sets that address major clinical outcomes including death, metastasis, recurrence, and response to therapy. They found that these studies show variable prognostic performance. Tan et al25 examined gene expression measurements generated from identical RNA preparations that were obtained using 3 commerically available microarray platforms from Affymetrix, Amersham, and Agilent. Correlations in gene expression levels and comparisons for significant gene expression changes in this subset showed considerable divergence across the different platforms. Michiels et al26 reanalyzed data from the 7 largest published studies that have attempted to predict prognosis of patients with cancer on the basis of DNA microarray analysis. The results reveal that the list of genes identified as predictors of prognosis was highly unstable and molecular signatures were strongly dependent on the selection of patients in the training sets. In addition, 5 of the 7 studies did not classify patients better than chance. The poor specificity and reproducibility are not surprising considering all the experimental variables that affect the quality of the data sets. These include variations in the laboratories, individuals, probe labeling, biochemical reactions, scanners, and lasers. Because of the low specificity, validation by other methods for measuring gene expression has become the "gold standard."25, 27 However, biological samples are not always abundant, and the price tag of validating all the genes discovered by microarray expression profiling is astronomical.


THE NATURE OF THE PROBLEM
 Jump to Section
 •Top
 •Introduction
 •The experimental system
 •Potential applications of...
 •Pitfalls of microarrays
 •The nature of the...
 •Highly specific expression...
 •Significance and future...
 •Author information
 •References

The specificity of the discovery should be stringent when the data sets consist of tens of thousands of genes and contain a predominant majority of noise. To illustrate, let us consider the example of a data set containing 500 true states of genetic expression (up-regulated or down-regulated) and 19 500 false-positive states (Table). Specificities of 99% and 95% yield 195 and 975 false-positive expression states, respectively. Thus, an analytical method having 100% sensitivity and 99% specificity discovers 695 genes (500 + 195), 28% (195/695) of which are false positive. Another method having 50% sensitivity and 99% specificity yields 445 genes (250 + 195), 44% (195/445) of which are false positive. This example illustrates the limitations of statistical significance when noise is predominant.


View this table:
[in this window]
[in a new window]
Table. Percentage of False-Positive Ratios in a Data Set Containing 500 and 19500 True and False States of Genetic Expression, Respectively*


Microarrays assay for the relative expression levels of a cDNA (1) in a biological sample as compared with another and (2) relative to other cDNAs within the same sample. The accuracy of fold changes is critical for data analysis. The results of Kothapalli et al23 reveal poor reproducibility and discrepancies of fold-change calculations between microarrays (interarray). Furthermore, the accuracy of calculations of fold changes of genes within a single microarray (intra-array) is not known. Low specificity, the preponderance and heterogeneity of noise, and inaccurate fold-change calculations impose significant limitations on data analysis. For example, apparent molecular classifications may be caused by data set–specific noise and the results of 1 laboratory may disintegrate when tested independently.24 Furthermore, variations in gene expression levels between biological samples may be caused by noise and not biological heterogeneity.


HIGHLY SPECIFIC EXPRESSION DISCOVERY
 Jump to Section
 •Top
 •Introduction
 •The experimental system
 •Potential applications of...
 •Pitfalls of microarrays
 •The nature of the...
 •Highly specific expression...
 •Significance and future...
 •Author information
 •References

Recent reports describe mathematical models that shed light on the behavior of noise in microarray data sets and algorithms that discover highly specific states of genetic expression (up-regulated or down-regulated) from genomewide expression profiling.10, 28 The mathematical models incorporate the principles of (1) preponderance and (2) heterogeneity of noise. The preponderance of noise implies that (1) the overwhelming majority of the genes on the array are not differentially expressed between samples (true negatives) and (2) the truly negative genes generate false-positive expression data (noise). Noise heterogeneity implies that the distribution of noise varies between data sets depending on quality. These principles may be summarized as follows:

  1. Each sample vs reference comparison generates tens of thousands of expression ratios.
  2. The model is based on the idea that less than 5% of all the genomic genes are truly differentially expressed between the sample and reference (true positives). The expression levels of the other more than 95% are not expected to be different (true negatives).
  3. Even when the expression levels of the genes do not differ between the sample and reference, the predominant majority of their measured expression ratios are not equal to 1 (noise, artifacts, or false positives).
  4. The distributions of the false positives vary widely between experiments; the variability is determined by quality.
  5. True-positive (<5%) and false-positive ratios (>95%) share the same distributions.

The mathematical tools generate highly specific discovery by modeling and filtering noise (Figure 2). The use of mathematical modeling and filters is common; to name a few examples, engineers apply filters to solve problems of noise in cellular telephones, digital music, and digital television.



View larger version (27K):
[in this window]
[in a new window]
Figure 2. A schematic depicting the behavior of noise (false-positive data or artifacts). Genomewide profiling of a sample vs a reference generates a data set including tens of thousands of ratios. The gene list includes a small fraction of differentially expressed genes (true-positive genes, < 5%) and a predominant majority of genes that are not differentially expressed (true-negative genes, > 95%). Because of noise, the true-negative genes appear as if they were differentially expressed. Furthermore, the distribution of noise differs between data sets. The heterogeneous colors of the large squares depict the idea that individual data sets have unique noise distributions that are dependent on experimental variations and on the quality of each data set. For example, large ratios may be false in a poor-quality data set and small ratios may be true in a better-quality data set.28 Highly specific discovery is applied to individual data sets. It discovers the small number of differentially expressed genes by filtering the dominant noise generated by the large number of the genes that are not differentially expressed.



SIGNIFICANCE AND FUTURE DIRECTIONS
 Jump to Section
 •Top
 •Introduction
 •The experimental system
 •Potential applications of...
 •Pitfalls of microarrays
 •The nature of the...
 •Highly specific expression...
 •Significance and future...
 •Author information
 •References

Highly specific genome-scale discovery of states of genetic expression has applications in all aspects of biology and medicine; it facilitates hypothesis-driven research and sets the stage for studies in systems biology.10, 16, 28 Several models that explain the relationship of genotype to phenotype have evolved over the past 40 years. First is the model of a single genetic lesion causing a phenotype; an example is sickle cell disease. A second model is that of several genotypes causing the same phenotype; examples include malignant brain tumors and Alzheimer disease. A third model is that of a single genetic lesion causing distinct phenotypes depending on polymorphisms; examples include hereditary Creutzfeldt-Jakob disease and fatal familial insomnia.29 Data from the highly specific genome-scale discovery in meningiomas are consistent with a fourth model of complex molecular systems.16, 18, 30-31 In this model, single genes or molecules of the cell belong to rich networks of molecular interactions that include transcriptional regulation, signaling pathways, protein-protein, and protein–nucleic acid interactions.16 These 4 models are not exclusive; for instance, complex molecular systems may also explain the heterogeneity of the clinical phenotypes of a dominant genetic lesion like expansion of the CAG repeats of Huntington disease.32

The idea that molecular systems, and not single genes, create phenotypes has important biological and therapeutic implications. The majority of clinical trials that have targeted single or a few genes have failed; most compounds that show efficacy in preclinical experiments and phase 1 and phase 2 clinical trials turn out to be ineffective in very expensive phase 3 trials. Hopefully, systems biology will improve the decision making for the transition to phase 3 clinical trials.33 The results of the meningioma study support the idea that the phenotypes are created by the principles of (1) multiplicity and (2) balancing of opposing molecular functions. Multiplicity is apparent because of the multifunctionality of single genes and because a given phenotype is caused not by a single molecule but rather by up-regulating several genes that promote a desirable "aberrant" function and by down-regulating a number of genes that prevent it. Thus, a "normal" biological phenotype seems to be created, maintained, and controlled by a tight balancing of opposing molecular functions. Meningiomas disturb this balanced expression to promote their phenotypes.16 The principle of multiplicity of complex molecular systems may explain the shortcomings of drug development. Targeting single genes or single pathways is likely to fail because molecular systems have redundant molecules or pathways that bypass the blockade. It is intuitive that targets selected based on molecular systems are more likely to be clinically effective than targets selected based on single molecules or pathways.

Microarrays can be extremely useful for many biological fields, particularly clinical neurology and systems biology, but they can also be very misleading. Not unlike many fields in physics, the full potential of microarrays awaits advances in mathematics. We ought to step back to the drawing board to develop better tools for data analysis.


AUTHOR INFORMATION
 Jump to Section
 •Top
 •Introduction
 •The experimental system
 •Potential applications of...
 •Pitfalls of microarrays
 •The nature of the...
 •Highly specific expression...
 •Significance and future...
 •Author information
 •References

Correspondence: Hassan M. Fathallah-Shaykh, MD, Department of Neurological Sciences, Section of Neuro-oncology, Rush University Medical Center, Chicago, IL 60612 (hfathall{at}rush.edu).

Accepted for Publication: July 29, 2004.

Author Affiliations: Department of Neurological Sciences, Section of Neuro-oncology, Rush University Medical Center, Chicago, Ill.


REFERENCES
 Jump to Section
 •Top
 •Introduction
 •The experimental system
 •Potential applications of...
 •Pitfalls of microarrays
 •The nature of the...
 •Highly specific expression...
 •Significance and future...
 •Author information
 •References

1. Alizadeh AA, Eisen MB, Davis E, et al. Distinct types of diffuse late B-cell lymphomas identified by gene expression profiling. Nature. 2000;403:503-511.
2. Bittner M, Meltzer P, Chen C, et al. Molecular classification of cutaneous melanoma by gene expression profiling. Nature. 2000;406:536-540. FULL TEXT | PUBMED
3. Ramaswamy S, Ross KN, Lander ES, Golub TR. A molecular signature of metastasis in primary solid tumors. Nat Genet. 2003;33:49-54. FULL TEXT | ISI | PUBMED
4. Dudoit S, Gentleman RC, Quackenbush J. Open source software for the analysis of microarray data. Biotechniques. 2003(suppl):45-51.
5. Chen Y, Dougherty ER, Bittner ML. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J Biomed Opt. 1997;2:364-374. FULL TEXT
6. Nishizuka S, Chen ST, Gwadry FG, et al. Diagnostic markers that distinguish colon and ovarian adenocarcinomas: identification by genomic, proteomic, and tissue array profiling. Cancer Res. 2003;63:5243-5250. FREE FULL TEXT
7. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995;270:467-470. FREE FULL TEXT
8. Lockhart DJ, Dong H, Byrne MC, et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol. 1996;14:1675-1680. FULL TEXT | ISI | PUBMED
9. DeRisi JL, Iyer VR, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997;278:680-686. FREE FULL TEXT
10. Fathallah-Shaykh H, Rigen M, Zhao L-J, et al. Mathematical modeling of noise and discovery of genetic expression classes in gliomas. Oncogene. 2002;21:7164-7174. FULL TEXT | ISI | PUBMED
11. van 't Veer LJ, Dai H, van de Vijver MJ, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415:530-536.
12. Perou CM, Jeffrey SS, Rees CA, et al. Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci U S A. 1999;96:9212-9217. FREE FULL TEXT
13. Pomeroy SL, Tamayo P, Sturla LM, et al. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature. 2002;415:436-442. FULL TEXT | PUBMED
14. Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531-537. FREE FULL TEXT
15. Sorlie T, Perou CM, Tibshirani R, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A. 2001;98:10869-10874.
16. Fathallah-Shaykh HM, He B, Zhao L-J, et al. Genomic expression discovery predicts pathways and opposing functions behind phenotypes. J Biol Chem. 2003;278:23830-23833. FREE FULL TEXT
17. Marton MJ, DeRisi JL, Iyer VR, et al. Drug target validation and identification of secondary drug target effects using DNA microarrays. Nat Med. 1998;4:1293-1301. FULL TEXT | ISI | PUBMED
18. Hood L. Systems biology: integrating technology, biology, and computation. Mech Ageing Dev. 2003;124:9-16. FULL TEXT | ISI | PUBMED
19. Holstege FC, Jennings EG, Wyrick JJ, et al. Dissecting the regulatory circuitry of a eukaryotic genome. Cell. 1998;95:717-728. FULL TEXT | ISI | PUBMED
20. Ideker T, Thorsson V, Ranish JA, et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science. 2001;292:929-934. FREE FULL TEXT
21. Fathallah-Shaykh HM. Logical networks inferred from highly specific discovery of transcriptionally regulated genes predict protein states in cultured gliomas. Biochem Biophys Res Commun. 2005;336:1278-1284. FULL TEXT | ISI | PUBMED
22. Fathallah-Shaykh HM. Genomic discovery reveals a molecular system for resistance to ER and oxidative stress in cultured glioma. Arch Neurol. 2005;62:233-236. FREE FULL TEXT
23. Kothapalli R, Yoder SJ, Mane S, Loughran TP Jr. Microarray results: how accurate are they? BMC Bioinformatics. 2002;3:22.
24. Ntzani EE, Ioannidis JP. Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet. 2003;362:1439-1444. FULL TEXT | ISI | PUBMED
25. Tan PK, Downey TJ, Spitznagel EL Jr, et al. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 2003;31:5676-5684. FREE FULL TEXT
26. Michiels S, Koscielny S, Hill C. Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet. 2005;365:488-492. FULL TEXT | ISI | PUBMED
27. Nielsen TO, Hsu FD, O'Connell JX, et al. Tissue microarray validation of epidermal growth factor receptor and SALL2 in synovial sarcoma with comparison to tumors of similar histology. Am J Pathol. 2003;163:1449-1456. FREE FULL TEXT
28. Fathallah-Shaykh H, He B, Zhao L-J, Badruddin A. A mathematical algorithm for discovering states of expression from direct genetic comparison by microarrays. Nucleic Acids Res. 2004;32:3807-3814. FREE FULL TEXT
29. Gambetti P, Parchi P, Chen SG. Hereditary Creutzfeldt-Jakob disease and fatal familial insomnia. Clin Lab Med. 2003;23:43-64. FULL TEXT | ISI | PUBMED
30. Hood L. Leroy Hood expounds the principles, practice and future of systems biology. Drug Discov Today. 2003;8:436-438. FULL TEXT | ISI | PUBMED
31. Ideker T, Galitski T, Hood L. A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet. 2001;2:343-372. FULL TEXT | ISI | PUBMED
32. Li JL, Hayden MR, Almqvist EW, et al. A genome scan for modifiers of age at onset in Huntington disease: the HD MAPS study. Am J Hum Genet. 2003;73:682-687. FULL TEXT | ISI | PUBMED
33. Roberts TG Jr, Lynch TJ Jr, Chabner BA. The phase III trial in the era of targeted therapy: unraveling the "go or no go" decision. J Clin Oncol. 2003;21:3683-3695. FREE FULL TEXT

SECTION EDITOR: DAVID E. PLEASURE, MD



THIS ARTICLE HAS BEEN CITED BY OTHER ARTICLES

Integrating Molecular Biology into the Veterinary Curriculum
Ryan and Sweeney
jvme 2007;34:658-673.
ABSTRACT | FULL TEXT  

DNA Microarrays in Herbal Drug Research
Chavan et al.
Evid Based Complement Alternat Med 2006;3:447-457.
ABSTRACT | FULL TEXT  

Pituitary tumours: findings from whole genome analyses.
Farrell
Endocr Relat Cancer 2006;13:707-716.
ABSTRACT | FULL TEXT  

Molecular biology for the clinician: understanding current methods.
Chandler and Colitz
Journal of the American Animal Hospital Association 2006;42:326-335.
ABSTRACT | FULL TEXT  





HOME | CURRENT ISSUE | PAST ISSUES | TOPIC COLLECTIONS | CME | SUBMIT | SUBSCRIBE | HELP
CONDITIONS OF USE | PRIVACY POLICY | CONTACT US | SITE MAP
 
© 2005 American Medical Association. All Rights Reserved.