 |
 |

Sequence Analysis of the Human Genome
Implications for the Understanding of Nervous System Function and Disease
Anibal Cravchik, MD, PhD;
G. Subramanian, MD, PhD;
Samuel Broder, MD;
J. Craig Venter, PhD
Arch Neurol. 2001;58:1772-1778.
ABSTRACT
The recent publication of the
sequence of the human genome will accelerate the discovery of new
genetic susceptibility factors for human disease, leading to the
development of novel diagnostics and therapeutics. The exhaustive
analysis of the human genome sequence will be the focus of the
biomedical research community for many years to come. In particular,
comparative analysis of the available eukaryotic genome sequences is an
important approach to further our understanding of gene structure,
function, and evolution. Our initial analysis of the human genome
sequence has revealed many interesting features that are relevant to
nervous system function, evolution, and disease. We analyzed the
prominent features of predicted human proteins involved in neuronal
function and prepared a comparative analysis of 146 human genes that
have alleles (or mutations) conferring susceptibility for 168
neurologic diseases.
INTRODUCTION
The recent publication of the sequence of the human genome1, 2 allows a comparative analysis of genes expressed
in the nervous system and genes associated with neurologic disease. The
nervous system in vertebrates exhibits a high degree of functional
complexity, which is supported by a comparatively large number of genes
expressed in the nervous tissue. Many human neurologic diseases
resulting from genetic mutations have been described. The human genes
that confer susceptibility to neurologic diseases belong to many
different families and have a large diversity of functions. Human genes
that confer disease risk are, of course, not "disease genes," as
they are sometimes referred to, because their primary function is
certainly not to cause disease. However, the observation that an
alteration in a gene sequence results in a detectable disease suggests
that the gene product plays a critical role for the survival of the
whole organism. We call such gene sequence alterations
disease-predisposing alleles.
The comparative analysis of the human genome with the fruit fly
and nematode genomes3, 4 is an important approach to further
our understanding of gene function and evolution. Any such analysis
must take into account that the human genome is estimated to contain
26 000 to 38 000 genes, the fruit fly genome contains
about 14 000 genes, and the nematode has about 19 000
genes. The expansion of gene families in the human genome is not
uniform, but instead reflects the important roles of developmental and
cellular processes that are unique to vertebrates.
COMPARATIVE GENOME ANALYSIS
An initial comparative analysis of the human genome with the fruit fly
and nematode genomes showed a marked expansion in the number of genes
coding for proteins involved in neural development, function, and
structure.1 This finding correlates with, but does not
completely explain, the observation that the human nervous system has a
much larger number of different neuronal cell types than the fruit fly
and nematode nervous systems.5 Such diversity in neuronal
morphology reflects the variety and complexity of gene expression and
regulation in the different neuronal cell types of the vertebrate
nervous system. Protein families involved in nervous system function and
development that are prominently expanded in humans include
myelin-related proteins, proteins involved in neuronal signaling
(voltage-gated ion channels and connexins), and proteins involved in
pathway finding by axons and neuronal network formation (cadherins,
ephrins, semaphorins, neuropilins, and plexins) (Figure
1).1, 6, 7 Of equal
interest is the expansion of proteins involved in apoptotic
regulation8; the process of programmed cell death or
apoptosis is likely to play an important role in neurodegenerative
diseases.9 Also expanded in humans are neuronal
cytoskeleton protein families such as actins and microtubule-associated
proteins (MAPs) of the MAP2/tau family (Figure 1). Several of
the proteins involved in neuronal communication are multidomain
proteins (a protein domain is described as a region on a protein that
shows structural, functional, and evolutionary conservation).
The observation of "domain shuffling" (whereby new multidomain
protein architectures are built by shuffling or adding different
evolutionarily conserved protein domains) is a prominent finding in the
multidomain proteins involved in neuronal function and
structure.1, 2 Therefore, in addition to an increase in the
protein repertoire, a substantial increase in the number of protein
interactions mediated by those domains is predicted in the human, as
compared with the fruit fly and nematode. Selected examples of such
novel vertebrate multidomain protein architectures are provided in
Figure 2.
|
|
|
|
Figure 1.
Expansions in human protein families involved in neural function,
structure, and development. Number of proteins in selected families in
the human, fruit fly, and nematode are compared (data from Venter et
al1).
|
|
|
|
|
|
Figure 2.
Schematic representation of the architecture of neuronal specific
proteins that are expanded in humans. Protein domains are represented
by convention as different geometric shapes. Domain names are as
follows: PDZ, domains found in diverse signaling proteins that may
target signaling molecules to submembranous sites; SH3 (Src homology
3), domains often found in proteins involved in signal transduction
related to cytoskeletal organization; GuKc, guanylate kinase
homologues; IL, interleukin; BIR, baculoviral inhibition of apoptosis
protein repeat; NACHT, family of adenosine triphosphatase; LRR,
leucine-rich repeats; ANK, ankyrin repeats; and SAM, sterile motif.
Biological descriptions of protein domains and families are available
through the Pfam10 (available at:
http://www.sanger.ac.uk/Software/Pfam/index.shtml) and
SMART11 (available at:
http://smart.embl-heidelberg.de/) databases. NMDA indicates
N-methyl-D-aspartate.
|
|
|
A very important evolutionary difference between vertebrate and
invertebrate nervous systems is the appearance of myelinating glial
cells, which provide axonal insulation and increase the speed of
propagation of action potentials. The human genome has at least 10
genes involved in myelin production; only 1 gene related to myelin
proteolipids was detected in the fruit fly and none was detected in the
nematode. Mutations in genes involved in myelin production can result
in severe demyelinating disorders such as Charcot-Marie-Tooth
neuropathy types 1A and 1B and Dejerine-Sottas syndrome (Figure 3).
|
|
|
Figure 3. Comparative analysis of human genes implicated in neurologic
diseases. A set of 168 human neurologic diseases resulting from
specific alleles of 146 different human genes was selected from Online
Mendelian Inheritance in Man (OMIM) for the comparative analysis. The
proteins encoded by those genes were used as queries to search the
nonredundant GenBank database for related proteins from Drosophila
melanogaster (fruit fly) and Caenorhabditis elegans
(nematode). BlastP searches were conducted as
described,12 and the results were color coded according to
their level of statistical significance, reflecting the degree of
confidence in their evolutionary and functional relationship. BlastP
E-values less than 10-100,
representing the highest degree of sequence conservation, are shown as
dark-green bars. E-values between 10-100 and
10-40 are represented in
blue-green color, indicating an intermediate level of conservation.
E-values in the range of 10-40
10-6 are shown in light blue,
indicating the lowest level of conservation. E-values greater than
10-6 are shown as white bars,
indicating absence of gene conservation. The OMIM disease entry numbers
(available at: http://www.ncbi.nlm.nih.gov/Omim/) and cytogenetic
locations are listed. APP indicates amyloid precursor protein; SOD1,
superoxide dismutase 1; PMP22, peripheral myelin protein 22;
GM2, a ganglioside with the addition of N-acetylgalactosamine; FRAXE, fragile site in chromosome
Xq28; NAGA, -N-acetylgalactosaminidase; PTS,
6-pyruvoyltetrahydropterin synthase; RLBP1, retinaldehyde-binding
protein 1; and HEXB, hexosaminidase B.
|
|
|
Other protein families involved in neural development, function, and
structure, and absent in the fruit fly and nematode, mediate cell
adhesion such as the connexin gap junction proteins. These are subunits
of the intercellular channels that form electrical synapses in
vertebrates. Mutations in the human connexin genes are involved in
diseases like X-linked Charcot-Marie-Tooth neuropathy and autosomal
dominant deafness type 3. Several ion channel families show marked
expansions in the human genome, for example, the voltage-gated channels
(Figure 1). Voltage-gated sodium and potassium channels play
a key role in the generation of neuronal action potentials. Mutations
in the voltage-gated potassium channel genes are involved in episodic
ataxia/myokymia syndrome, autosomal dominant deafness type 2, and
benign neonatal epilepsy types 1 and 2. Voltage-gated calcium channels
also play a central role in neurotransmitter release; mutations in some
members of this gene family are responsible for disorders like episodic
ataxia type 2, familial hemiplegic migraine, X-linked congenital
night blindness type 2, and spinocerebellar
ataxia type 6 (Figure 3). The tubulin-binding proteins of the
MAP2/tau family are involved in dendrite and axonal morphologic
determination, contributing to the development of neuronal morphologic
characteristics. Mutations in the human TAU gene lead to
frontotemporal dementia with parkinsonism.
We performed a comparative analysis of 168 neurogenetic
diseases selected from Online Mendelian Inheritance in Man. These
diseases result from specific alleles of 146 different human genes
(Figure 3). The sequence homology analysis was done by means
of BlastP as described previously,12 and no subjective
judgments were done for orthologous genes, since these are often quite
difficult to determine for Caenorhabditis elegans
genes.12 For the 146 human genes surveyed, we found similar
levels of sequence conservation in the fruit fly and nematode. About
56% of those genes show high or intermediate sequence conservation in
Drosophila (83 genes) and C elegans (81
genes). This is surprising given the fact that the nervous
system in the fruit fly is significantly more complex than that in the
nematode. There are 3 cases of gene conservation with the
nematode but not the fruit fly: hypoxanthine phosphoribosyltransferase
1 (involved in Lesch-Nyhan syndrome), the Machado-Joseph disease gene,
and phytanoylcoenzyme A hydroxylase (Refsum disease). Four
genes are conserved in the fruit fly but not in the nematode: otoferlin
(autosomal recessive deafness type 9), fragile X mental retardation 1,
spinocerebellar ataxia 2, and the sonic hedgehog homologue
(holoprosencephaly type 3), a gene that was originally characterized in
the fruit fly.
Several human genes in our survey had counterparts in the
Drosophila and C elegans genomes. Many of these have
been well characterized by molecular studies, particularly with the
fruit fly used as the animal model. Examples of such genes are
diaphanous homologue (autosomal dominant nonsyndromic deafness type 1);
notch homologue 3 (cerebral autosomal dominant arteriopathy
with subcortical infarcts and leukoencephalopathy [CADASIL]);
presenilin 1, presenilin 2, and amyloid ß-precursor protein (familial
early-onset Alzheimer disease); and superoxide dismutase 1 (amyotrophic
lateral sclerosis). Genetic studies in the fruit fly have
made important contributions to our understanding of their cellular
function and the molecular mechanisms involved in
neurodegeneration (reviewed by Fortini and
Bonini13). Deletion of the fruit fly ß-amyloid
precursor proteinlike gene leads to behavioral defects that can be
partially rescued by transgenic expression of the human amyloid
precursor protein gene.14 Loss-of-function mutations in the
fruit fly presenilin gene cause neurogenic and other developmental
defects.15 In the nematode, deletion of the presenilin
homologue sel-12 causes an egg-laying defective phenotype that can be
fully rescued by normal human presenilin but, interestingly, not by
human presenilins carrying mutations linked to familial early-onset
Alzheimer disease.16 The fruit fly has been a very useful
animal model for the study of the pathogenesis of polyglutamine repeat
diseases such as Huntington and Machado-Joseph diseases. Expression of
polyglutamine-expanded huntingtin and Machado-Joseph disease protein in
the fruit fly induced neuronal degeneration.17, 18 An
important advantage of the fruit fly and nematode animal model systems
is the application of large-scale genetic screening methods to identify
novel genes that can modulate the molecular mechanisms of disease. A
recent genetic screen identified 2 Drosophila genes that
appear to modulate the polyglutamine-induced neurodegeneration, which
may lead to better understanding of pathogenesis and ultimately to
novel therapeutic development.19
Other examples of human genes involved in neurologic diseases
that have homologues in the fruit fly and nematode are cyclic
nucleotidegated channel -3 (achromatopsia), -thalassemia/mental
retardation syndrome gene, adenosine triphosphatebinding cassette D1
(adrenoleukodystrophy), platelet-activating factor acetylhydrolase
1b -subunit (lissencephaly type 1), Niemann-Pick disease C1 gene,
phenylalanine hydroxylase (phenylketonuria), cyclic nucleotidegated
channel -1 (retinitis pigmentosa), tyrosine hydroxylase (Segawa
syndrome), and aldehyde dehydrogenase 3A2 (Sjögren-Larsson
syndrome).
About 44% of the human genes in our selected set appear to have
no counterparts in the fruit fly and nematode, including the genes
involved in myelin production, gap junctions, and voltage-gated ion
channels discussed above. Other examples of those nonconserved genes
are neuronal ceroid-lipofuscinosis 2, 5, and 8;
dentatorubropallidoluysian atrophy; fragile X mental retardation 2;
monoamine oxidase A (Brunner syndrome); Norrie disease gene; prion
protein gene (Creutzfeldt-Jakob disease, Gerstmann-Strausler-Scheinker
syndrome, and fatal familial insomnia); spinocerebellar ataxia 7 and
10; and -synuclein (familial Parkinson disease). Although
the gene encoding -synuclein is absent in the fruit fly, expression
of human -synuclein in Drosophila has been shown to produce
loss of dopaminergic neurons, filamentous intraneuronal inclusions, and
locomotor dysfunction reminiscent of Parkinson disease.20
This suggests that fruit flies have some conservation in the mechanisms
leading to neurodegeneration in Parkinson disease, even though one of
its components ( -synuclein) may be absent.
INTERCHROMOSOMAL BLOCK DUPLICATIONS
More than 1000 interchromosomal segmental duplications have been
detected in the human genome.1 Many of these large block
duplications appear to have an ancient origin and are likely to predate
most vertebrate divergences, having undergone many subsequent deletions
and rearrangements.1 The block duplications range in size
from a few genes to segments covering most of a chromosome.
Interestingly, many genes that have disease-associated alleles are
present in the duplicated segments. Furthermore, in some instances the
genes in both duplicated segments have alleles associated with
similar diseases. We present examples of
interchromosomal duplicated segments associated with
neurologic diseases (Table 1). The
homeobox genes sine oculis homologues 3, 1, and 6 are located in
duplicated segments in human chromosomes 2 and 14. The
Drosophila sine oculis gene is a transcription factor that
plays a crucial role in morphogenesis, and its mutant alleles lead to
defects in eye morphologic features and neuronal development.
Mutations in the human sine oculis homeobox homologue 3 gene are
associated with neurodevelopmental alterations in holoprosencephaly
type 2. The G protein -transducing activity polypeptide 1 gene in
human chromosome 3 encodes a transducin -subunit involved in the
stimulation of cyclic guanosine monophosphatephosphodiesterase in rod
photoreceptors, and its mutations are associated with congenital
stationary night blindness. A duplicated segment containing the G
protein -inhibiting subunit 1 gene is present in chromosome 7
(Table 1).
|
|
|
|
Paralogous Genes on Duplicated Genome Segments That Have Alleles Involved in Neurologic Diseases*
|
|
|
The -synuclein gene, associated with familial Parkinson
disease, is located in a chromosome 4 segment that is duplicated in
chromosome 10, where the -synuclein gene is located. The chromosome
15 gene hexosaminidase A is associated with Tay-Sachs disease, an
autosomal recessive progressive neurodegenerative disorder that is
prevalent in the Ashkenazi Jewish population. A duplicated segment in
chromosome 5 contains the hexosaminidase B ß gene, which is
associated with Sandhoff disease, a disorder clinically similar to
Tay-Sachs disease, but observed mostly in non-Jewish patients. Another
example of 2 paralog genes associated with neurogenetic disease is
phenylalanine hydroxylase and tyrosine hydroxylase, which are,
respectively, part of a segment duplicated in chromosomes 12 and 11.
Phenylalanine hydroxylase mutations are associated with
phenylketonuria, and tyrosine hydroxylase mutations are linked to
autosomal recessive Segawa syndrome, a disease with levodopa-responsive
parkinsonism that appears early in infancy (Table 1). The gene
coding for voltage-gated potassium ion channel 1, linked to episodic
ataxia syndrome, is located in a chromosome 12 segment that has
duplications in chromosome 1 containing 2 paralog genes of the
Shaker-related subfamily of voltage-gated potassium ion channels
(Table 1). The cochlin precursor gene in chromosome 14 linked
to autosomal dominant deafness type 9 is duplicated in
chromosome 2. The gene coding for microtubule-associated
protein tau in chromosome 17, associated with frontotemporal dementia
with parkinsonism (familial Pick disease), is part of a segment
duplicated in chromosome 2 that contains the gene coding for the
related tubulin-binding protein MAP2. The gene in chromosome 17 linked
to lissencephaly type 1, in which there is an abnormality in early
neuronal migration and development of the cerebral cortex leading to
agyria (lack of cortical convolutions or gyri), is part of a segment
duplicated in chromosome 5. The gene for gap junction protein connexin
32, associated with X-linked Charcot-Marie-Tooth neuropathy, is part of
a segment duplicated in chromosome 13 that contains 3 paralogous genes
coding for connexin 26 (linked to autosomal dominant deafness type 3),
connexin 30, and connexin 46 (Table 1).
CONCLUSIONS
The availability of an assembled human genome sequence will
significantly accelerate the discovery of genetic susceptibility
factors for human disease. Millions of human single nucleotide
polymorphisms and other forms of DNA sequence variations have been
added to the single nucleotide polymorphism databases; these markers
will enhance the genetic approaches for the identification of disease
susceptibility factors. Comparative analyses of genomes from different
species and phyla bring a new perspective to the study of gene function
and evolution and will be an important tool for furthering our
understanding of the role of alleles in conferring disease
predisposition. The sequencing and assembly of the human genome have
also enabled the detection of interchromosomal block duplications
containing novel paralogous genes. Further investigation of the novel
genes resulting from ancient duplication of genes with
disease-associated alleles is required to determine if they are also
involved in similar genetic diseases. Moreover, research on those genes
may reveal new insights into pathogenesis and therapeutic development.
Genomics and bioinformatics will merge with neurobiology to provide
powerful new approaches for advancing our understanding of the
complex biology and pathology of the human nervous system.
AUTHOR INFORMATION
Accepted for publication July 27, 2001.
From Celera Genomics, Rockville, Md.
Corresponding author and reprints: Anibal Cravchik, MD,
PhD, Celera Genomics, 45 W Gude Dr, Rockville, MD 20850 (e-mail: Anibal.Cravchik{at}celera.com).
REFERENCES
 |  |
1. Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science. 2001;291:1304-1351.
FREE FULL TEXT
2. Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis
of the human genome. Nature. 2001;409:860-921.
FULL TEXT
| PUBMED
3. Adams MD, Celniker SE, Holt RA, et al. The genome sequence of
Drosophila melanogaster. Science. 2000;287:2185-2195.
FREE FULL TEXT
4. The C. elegans Sequencing Consortium.. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 1998;282:2012-2018.
FREE FULL TEXT
5. Kandel ER, Schwartz JH, Jessell T. Principles of Neural Science. 4th ed. New York, NY: McGraw-Hill Inc; 2000.
6. Missler M, Sudhof TC. Neurexins: three genes and 1001 products. Trends Genet. 1998;14:20-26.
FULL TEXT
|
ISI
| PUBMED
7. Ranscht B. Cadherins: molecular codes for axon guidance and synapse
formation. Int J Dev Neurosci. 2000;18:643-651.
FULL TEXT
|
ISI
| PUBMED
8. Aravind L, Dixit VM, Koonin EV. Apoptotic molecular machinery: vastly
increased complexity in vertebrates revealed by genome comparisons. Science. 2001;291:1279-1284.
FREE FULL TEXT
9. Yuan J, Yankner BA. Apoptosis in the nervous system. Nature. 2000;407:802-809.
FULL TEXT
| PUBMED
10. Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer EL. The
Pfam protein families database. Nucleic Acids Res. 2000;28:263-266.
FREE FULL TEXT
11. Schultz J, Copley RR, Doerks T, Ponting CP, Bork P. SMART: a
web-based tool for the study of genetically mobile domains. Nucleic Acids Res. 2000;28:231-234.
FREE FULL TEXT
12. Rubin GM, Yandell MD, Wortman JR, et al. Comparative genomics of the
eukaryotes. Science. 2000;287:2204-2215.
FREE FULL TEXT
13. Fortini ME, Bonini NM. Modeling human neurodegenerative diseases in
Drosophila: on a wing and a prayer. Trends Genet. 2000;16:161-167.
FULL TEXT
|
ISI
| PUBMED
14. Luo L, Tully T, White K. Human amyloid precursor protein ameliorates
behavioral deficit of flies deleted for Appl gene. Neuron. 1992;9:595-605.
FULL TEXT
|
ISI
| PUBMED
15. Ye Y, Lukinova N, Fortini ME. Neurogenic phenotypes and altered Notch
processing in Drosophila presenilin mutants. Nature. 1999;398:525-529.
FULL TEXT
| PUBMED
16. Levitan D, Doyle TG, Brousseau D, et al. Assessment of normal and
mutant human presenilin function in Caenorhabditis elegans. Proc Natl Acad Sci U S A. 1996;93:14940-14944.
FREE FULL TEXT
17. Jackson GR, Salecker I, Dong X, et al. Polyglutamine-expanded human
huntingtin transgenes induce degeneration of Drosophila
photoreceptor neurons. Neuron. 1998;21:633-642.
FULL TEXT
|
ISI
| PUBMED
18. Warrick JM, Paulson HL, Gray-Board GL, et al. Expanded polyglutamine
protein forms nuclear inclusions and causes neural degeneration in
Drosophila. Cell. 1998;93:939-949.
FULL TEXT
|
ISI
| PUBMED
19. Kazemi-Esfarjani P, Benzer S. Genetic suppression of polyglutamine
toxicity in Drosophila. Science. 2000;287:1837-1840.
FREE FULL TEXT
20. Feany MB, Bender WW. A Drosophila model of Parkinson's
disease. Nature. 2000;404:394-398.
FULL TEXT
| PUBMED
SECTION EDITOR: HASSAN M. FATHALLAH-SHAYKH, MD
RELATED ARTICLES
Genomics and the Transformation of Neurology
Roger N. Rosenberg
JAMA. 2001;286(22):2869-2870.
EXTRACT
| FULL TEXT
Archives of Neurology Reader's Choice: Continuing Medical Education
Arch Neurol. 2001;58(11):1942-1944.
FULL TEXT
THIS ARTICLE HAS BEEN CITED BY OTHER ARTICLES
Conserved transcription factor binding sites of cancer markers derived from primary lung adenocarcinoma microarrays
Yap et al.
Nucleic Acids Res 2005;33:409-421.
ABSTRACT
| FULL TEXT
Tales From the Neural Genome: The Lessons of Homozygous Porphyria
Wilson
Arch Neurol 2004;61:1650-1651.
FULL TEXT
Neurology: Then, Now, and in the Future
McKhann
Arch Neurol 2002;59:1369-1373.
FULL TEXT
Genomics and the Transformation of Neurology
Rosenberg
JAMA 2001;286:2869-2870.
FULL TEXT
|