An Efficient Read Alignment Method to Detect Genetic Structural Variations

K. Lee, J. Yoon, D. Hong, S. Hong, and D. Seong (Korea)

Keywords

Structural variations, giga-sequencing, read alignment, binary trie

Abstract

Recently it was found that various genetic structural variations such as CNVs (Copy Number Variations) exist in the human genome, and these variations are closely related with disease susceptibility, reaction to treatment, and genetic characteristics. We propose a new method that effectively detects structural variations by using millions of short DNA sequences (reads) generated by giga-sequencing technology. Our method maps the reads onto the reference sequence, and then it detects the candidate variation regions by analyzing the distribution of the aligned reads statistically. The proposed method basically adopts a binary trie as its index structure and stores all window subsequences extracted from the reference sequence. For approximate read search, it divides each read into shorter pieces, performs searching for those pieces, and then merges their results. Efficiency of the suggested method in detecting CNV region and in processing a large number of reads is confirmed by simulation tests using a reference sequence (build 35) of NCBI (National Center for Biotechnology Information).

Important Links:



Go Back