周光金,余 龙,赵寿元
复旦大学遗传学研究所,上海 200433

摘 要:摘 要:假基因是基因组上与编码基因序列非常相似的非功能性基因组DNA拷贝,一般情况都不被转录,且没有明确生理意义。假基因根据其来源可分为复制假基因和已加工假基因。迄今为止,明确鉴定的人类假基因多为已加工假基因,有8 000个之多。在Swiss-Prot/TrEMBL收录的编码蛋白质的将近25 500个基因序列中,约10%在基因组中有一个或多个近全长已加工假基因。其余的功能基因都没有已加工假基因。核糖体蛋白基因具有最多数量的已加工假基因,约有1 700个(占已加工假基因数的22%),少数基因,如cyclophilin A、肌动蛋白(actin)、角蛋白(keratin)、GAPDH、细胞色素C(cytochrome c) 和nucleophosmin等则有很多份已加工假基因。总体上讲,假基因在人类染色体上的分布与染色体长度成比例,但已加工假基因在GC含量为41%~46%的染色体区域密度最高。已加工假基因的拷贝数和功能基因在生殖器官中的表达高度一致,说明许多假基因发生在胚胎阶段,另外也和基因中GC含量和基因大小密切相关。假基因的准确鉴定对基因组进化、分子医学研究和医学应用具有重要意义。

The pseudogenes of human genome
ZHOU Guang-Jin, YU Long, ZHAO Shou-Yuan
Institute of Genetics, Fudan University, Shanghai 200433, China

Abstract: Pseudogenes are sequences in the genome that have close similarities to functional genes, but in general are unable to be transcribed and have no functional significance. There are two major types of pseudogenes: duplicated (non-processed) and processed (retrotransposed). About 8 000 near full-length processed pseudogenes in recent were identified in human genome draft. The majority of the human genes (90% of the whole proteome) have no processed pseudogenes, 10% human genes have at least one or more processed pseudogenes. About one fifth of the total processed pseudogenes are derived from highly expressed ribosomal proteins; other notable gene subgroups include cyclophilin A, keratin, GAPDH, cytochrome c and nucleophosmin. Their chromosomal distribution appears random and dispersed, with the number on each chromosome proportional to its length, while processed pseudogenes have the highest density in regions of intermediate GC content (41%~46%). The prevalence of processed pseudogenes is in good agreement with gene-expression levels in the germ line, consistent with the mechanism of pseudogene biogenesis. Identification of human pseudogenes is importance to genome evolution, molecular medicine research and application.
Key words: processed pseudogene; duplicated pseudogene; genome; molecular evolution

Back to top