Synchronization of Audio (Speech) and Closed Caption Signals for Content-based Video Indexing and Retrieval

J.M. Son and K.S. Bae (Korea)


video indexing, video retrieval,speech recognition, closed caption


As more and more video databases are digitized and stored in accessible archival files, the need grows for effective ways to index and retrieve them based on their contents. Closed caption (CC) that is provided in many kinds of videos can be used for extracting content-based useful information. Since the CC signal is not time aligned with video signal, however, it cannot be used directly for content-based video indexing and/or retrieval. Thus the speech signal in the audio track that is time aligned with the video can be used to synchronize the CC and video signals. In this paper, a synchronization method of speech and CC signals has been proposed for content based video indexing and retrieval. Using the speech recognition technology with the CC, synchronized time information of the CC signal with the video signal is obtained. From the experiments of video summary generation using the CC with its synchronized time code, 56 video summaries are made successfully from 57 TV news stories. It demonstrates that the proposed scheme is very promising for content-based video segmentation.

