D. He and X. Wu (USA)
The discovery of repetitive patterns is a fundamental prob
lem in computational biology. There are many algorithms
for ﬁnding both exact and approximate repetitive patterns.
However, only a few of them have considered both length
and frequency factors of those patterns. Since approximate
repetitive patterns are more realistic, in this paper we pro
pose an efﬁcient algorithm for ﬁnding approximate repet
itive patterns whose lengths and frequencies are greater
than some pre-deﬁned thresholds. Our deﬁnition of repet
itive patterns is based on “elementary” repeats which are
the basic building blocks of complex repetitive patterns.
We show that the time complexity of our algorithm is
+ Nf (
− f)) for the Hamming distance and
+ Nf (
− f)) for the Edit distance, where
n is the length of the given sequence, f is the frequency
threshold, N is the expected number of seeds, f i
expected frequency of seeds and, k is for k-(mismatch or
Approximate repetitive patterns, sufﬁx tree, elementary re
peats, complex structure.