Although significant advances have been made in recent years in understanding heterochromatin structure and function, we know too little about the genomic sequences in the heterochromatin, the organization of these sequences, and their roles in essential biological functions. The efforts of the BDGP and Celera Genomics have produced a whole genome shotgun sequence assembly (WGS3), full-length cDNA sequences and EST data, and draft sequences of heterochromatic Bacterial artificial chromosomes (BACs). In addition, we have generated a large collection of heterochromatic P element insertions that can be used for investigations of heterochromatin sequence, structure and function.

In collaboration with the Drosophila Genome Center (DGC), we have established the 'Drosophila Heterochromatin Genome Project' (DHGP). The DHGP is using these unprecedented resources to assemble, map, and annotate high-quality, 'finished' sequence for a large portion of Drosophila centric heterochromatin. The successful completion of these studies will produce tools and information that will aid future studies of heterochromatin structure, function and evolution in Drosophila, and in other eukaryotes.

We have recently published a detailed analysis of the WGS3 draft heterochromatic sequence [pdf].

First, BAC-based fluorescence in situ hybridization analysis (FISH) was used to correlate the genomic sequence with the cytogenetic map and to refine the genomic definition of the centric heterochromatin. On the basis of our cytological definition, the annotated Release 3 euchromatic sequence extends into the centric heterochromatin on each chromosome arm.

Second, annotation of 20.7 Mb of non-redundant WGS heterochromatic sequence and 2.1 Mb of heterochromatic sequence at the base of the finished euchromatic arms identified ~450 heterochromatic gene models. In the following example of the light (lt) gene region annotation, blue=genes (thick bars are exons), red=transposable elements, and black=sequence gaps. Despite the high repeat content of heterochromatin (50% of the sequence is transpsosons!) and the draft quality of the WGS3 sequence, the gene models are generally reliable, as demonstrated by the identification of previously known heterochromatic genes, and alignment with cDNA sequences.

We are currently working to finish, map, and extend the heterochromatic sequence, and to continue to improve the sequence annotation. The quality of the annotation will be greatly improved by sequence finishing, additional full-length cDNA sequences of heterochromatic genes, comparisons with the mosquito and D. pseudoobscura WGS sequences, and optimization of strategies for annotating repeat-rich genomic sequences.