Y. Lu, S. Lu, F. Fotouhi, Y. Sun, Z. Yang, and L.R. Liang (USA)
Pattern Discovery, Confidence, DNA sequence
Pattern discovery in DNA sequences is one of the most challenging tasks in molecular biology and computer science. The main goal of pattern discovery in DNA sequences is to identify sequences of important biological function hidden in the huge amounts of genomic sequences. Several methods and techniques have been proposed and implemented in this field. However, in order to reduce computational time and complexity, most of them either focus on finding short DNA patterns or require explicit specification of pattern lengths in advance. Scientists need to find longer patterns without specifying pattern lengths in advance and still have good performance. In this paper, we propose a pattern discovery algorithm called Pattern Discovery with Confidence (PDC). Based on biological studies, we propose a new measurement system that can identify over-represented patterns inside DNA sequences. Using this measurement, PDC algorithm can narrow the search space by checking dependency along the pattern, thus extending the pattern as long as possible without the need to restrict or specify the length of a pattern in advance. Experimental tests demonstrate that this approach can find long, interesting patterns within a reasonable computation time.
Important Links:
Go Back