K. Iwata, Y. Itoh, K. Kojima, M. Ishigame, K. Tanaka, and S.-W. Lee (Japan)
3D Object Extraction, Surface Reconstruction, I3D, Remote Sensing, Image Visualization, Computer Vision
We propose a new type of a video retrieval system that identifies target video sections by a text or speech query. The system is applied to retrieve inquiries in a special TV broadcast program in a disaster, such as the Niigata Chuetsu Earthquake in Japan. The system uses a subword model such as phone or tri-phone models. Subword models do not impose vocabulary constraints to the system. This flexibility of query words is needed for retrieval systems because most keywords are basically proper nouns that correspond to the person a user wants to search for. The system based on speech recognition does not work well because the proper nouns cannot be prepared beforehand. The system utilizes phonetic similarities between subword models to improve the retrieval performance. The phonetic similarity used in the system is obtained by defining the statistical distance between any two subword models that are composed of HMMs. We conducted some experiments to show the effectiveness and possibility of our method, and the system works well for the retrieval of inquiries in real TV disaster broadcasting.
Important Links:
Go Back